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ABSTRACT cDNA microarray technology is used to profile 
complex diseases and discover novel disease-related genes. In 
inflammatory disease such as rheumatoid arthritis, expression 
patterns of diverse cell types contribute to the pathology. We 
have monitored gene expression in this disease state with a 
microarray of selected human genes of probable significance in 
inflammation as well, as with genes expressed in peripheral 
human blood cells. Messenger RNA from cultured macrophages, 
chondrocyte cell lines, primary chondrocytes, and synoviocytes 
provided expression profiles for the selected cytokines, chemo- 
kines, DNA binding proteins, and matrix-degrading metal- 
loproteinases. Comparisons between tissue samples of rheuma- 
toid arthritis and inflammatory bowel disease verified the in- 
volvement of many genes and revealed novel participation of the 
cytokine interleukin 3, chemokine Groa and the metal- 
loproteinase matrix metaUo-elastase in both diseases. From the 
peripheral blood library, tissue inhibitor of metalloproteinase 1, 
ferritin light chain, and manganese superoxide dismutase genes 
were identified as expressed differentially in rheumatoid arthri- 
tis compared with inflammatory bowel disease. These results 
successfully demonstrate the use of the cDNA microarray system 
as a general approach for dissecting human diseases. 



The recently described cDNA microarray or DNA-chip tech- 
nology allows expression monitoring of hundreds and thou- 
sands of genes simultaneously and provides a format for 
identifying genes as well as changes in their activity (1, 2). 
Using this technology, two-color fluorescence patterns of 
differential gene expression in the root versus the shoot tissue 
of Arabidopsis were obtained in a specific array of 48 genes (1). 
In another study using a 1000 gene array from a human 
peripheral blood library, novel genes expressed by T cells were 
identified upon heat shock and protein kinase C activation (3). 

The technology uses cDNA sequences or cDNA inserts of a 
library for PCR amplification that are an-ayed on a glass slide with 
high speed robotics at a density of 1000 cDNA sequences per cm 2 . 
These microarrays serve as gene targets for hybridization to 
cDNA probes prepared from RNA samples of cells or tissues. A 
two-color fluorescence labeling technique is used in the prepa- 
ration of the cDNA probes such that a simultaneous hybridization 
but separate detection of signals provides the comparative anal- 
ysis and the relative abundance of specific genes expressed (1, 2). 
Microarrays can be constructed from specific cDNA clones of 
interest, a cDNA library, or a select number of open reading 
frames from a genome sequencing database to allow a large-scale 
functional analysis of expressed sequences. 
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Because- of the wide 'spectrum of genes and endogenous 
mediators involved, the microarray technology is well suited 
for analyzing chronic diseases. In rheumatoid arthritis (RA), 
inflammation of the joint is caused by the gene products of 
many different cell types present in the synovium and cartilage 
tissues plus those .infiltrating from the circulating blood. The 
autoimmune and inflammatory nature of the disease is a 
cumulative result of genetic susceptibility factors and multiple 
responses, paracrine and autocrine in nature, from macro- 
phages, T cells, plasma cells, neutrophils, synovial fibroblasts, 
chondrocytes, etc. Growth factors, inflammatory cytokines 
(4), and the chemokines (5) are the important mediators of this 
inflammatory process. The ensuing destruction of the cartilage 
and bone by the invading synovial tissue includes the actions 
of prostaglandins and leukotrienes (6), and the matrix degrad- 
ing metalloproteinases (MMPs). The MMPs.are an important 
class of Zn-dependent metallo-endoproteinases that can col- 
lectively degrade the proteoglycan and collagen components of 
the connective tissue matrix (7). 

This paper presents a study in which the involvement of 
select classes of molecules in RA was examined. Also inves- 
tigated were 1000 human genes randomly selected from a 
peripheral human blood cell. library. Their differential and 
quantitative expression analysis in cells of the joint tissue, in 
diseased RA tissue and in inflammatory bowel disease (IBD) 
tissues was conducted to demonstrate the utility of the mi- 
croarray method to analyze complex diseases by their pattern 
of gene expression. Such a survey provides insight not only into 
the underlying cause of the pathology, but also provides the 
opportunity to selectively target genes for disease intervention 
by appropriate drug development and gene therapies. 

METHODS 

Microarray Design, Development, and Preparation. Two ap- 
proaches for the fabrication of cDNA microarrays were used in 
this study. In the first approach, known human genes of probable 
significance in RA were identified. Regions of the clones, pref- 
erably 1 kb in length, were selected by their proximity to the 3' end 
of the cDNA and for areas of least identity to related and 
repetitive sequences. Primers were synthesized to amplify the 
target regions by standard PCR protocols (3). Products were 

Abbreviations: RA, rheumatoid arthritis; MMP, matrix-degrading 
metalloproteinase; IBD, inflammatory bowel disease; LPS, lipopoly- 
saccharide; PMA, phorbol 12-myristate 13-acetate; TNF-a, tumor 
necrosis factor a; IL, interleukin; TGF-0, transforming growth factor 
0; GCSF, granulocyte colony-stimulating factor; MIP, macrophage 
inflammatory protein; MIF, migration inhibitory factor; HME, human 
matrix metallo-elastase; RANTES, regulated upon activation, normal 
T cell expressed and secreted; Gel, gelatinase; VCAM, vascular cell 
adhesion molecule; ICE, IL-1 converting enzyme; PUMP, putative 
metalloproteinase; MnSOD, manganese superoxide dismutase; TIMP, 
tissue inhibitor of metalloproteinase; MCP, macrophage chemotactic 
protein. 
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verified by gel electrophoresis and purified with Qiaquick 96-well 
purification kit (Qiagen, Chatsworth, CA), hyophilized (Savant) 
and resuspended in 5 /xl of 3x standard saline citrate (SSC) buffer 
for arraying. In the second approach, the microarray containing 
the 1056 human genes from the peripheral blood lymphocyte 
library was prepared as described (3). 

Tissue Specimens. Rheumatoid synovial tissue was obtained 
from patients with late stage classic RA undergoing remedial 
synovectomy or arthroplasty of the knee. Synovial tissue was 
separated from any associated connective tissue or fat. One 
gram of each synovial specimen was subjected to RNA extrac- 
tion within 40 min of surgical excision, or explants were 
cultured in serum-free medium to examine any changes under 
in vitro conditions. For IBD, specimens of macroscopically 
inflamed lower intestinal mucosa were obtained from patients 
with Crohn disease undergoing remedial surgery. The hyper- 
trophied mucosal tissue was separated from underlying con- 
nective tissue and extracted for RNA. 

Cultured Cells. The Mono Mac-6 (MM6) monocytic cells 
fcc? g '° Wn in RPMI mediu *i- Human chondrosarcoma 
7n , nx ' P nmar y human chondrocytes, and synoviocytes 
(9, 10) were cultured in DMEM; all culture media were 
supplemented with 10% fetal bovine serum, 100 /tg/ml strep- 
tomycin, and 500 units/ml penicillin. Treatment of cells with 
lipopolysaccharide (LPS) endotoxin at 30 ng/ml, phorbol 
12-mynstate 13-acetate (PMA) at 50 ng/ml, tumor necrosis 
factor a (TNF-a) at 50 ng/ml, interleukin (IL)-lp at 30 ng/ml 
or transforming growth factor-^ (TGF-/3) at 100 ng/ml is 
described in the figure legends. 
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Fluorescent Probe, Hybridization, and Scanning. Isolation of 
mKNA, probe preparation, and quantitation with Ambidopsis 
control mRNAs was essentially as described (3) except for the 
following minor modification. Following the reverse transcriptase 

Ste T?; T t I ie J appr0priate and labeled samples were pooled- 
mRNA degraded by heating the sample to 65°C for 10 min with 
the addition of 5 M l of 0.5M NaOH plus 0.5 ml of 10 mM EDTA. 
The pooled cDNA was purified from unincorporated nucleotides 
by gerfiltration in Centri-spin columns (Princeton Separations, 
Adelphia, NJ). Samples were lyophilized and dissolved in 6 al of 
hybridization buffer (5x SSC plus 0.2% SDS). Hybridizations 
washes, scanning, quantitation procedures, and pseudocolor rer> 
resentations of fluorescent images have been described (3). Scans 
for the two fluorescent probes were normalized either to the 
fluorescence intensity of Ambidopsis mRNAs spiked into the 
labeling reactions (see Figs. 2-4) or to the signal intensity of 

fr a ApAi J and ^I^^'^osphate dehydrogenase 
(GAPDH; see Fig. 5). 

RESULTS 

Ninety-Six-Gene Microarray Design. The actions of cytokines, 
growth factors, chemokines, transcription factors, MMPs, pros- 
taglandins, and leukotrienes are well recognized in inflammatory 
disease, particularly RA (11-14). Fig. 1 displays the selected genes 
lor this study and also includes control cDNAs of housekeeping 
genes such as /3-actin and GAPDH and genes from Arabidopsl 
tor signal normalization and quantitation (row A, columns 1-12) 

Defining Microarray Assay Conditions. Different lengths and 
concentrations of target DNA were tested by arraying PCR- 
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amplified products ranging from 0.2 to 1.2 kb at concentrations 
of 1 /ig//il or less. No significant difference in the signal levels was 
observed within this range of target size arid only with p.2-kb 
length was a signal reduced upon an 8-fold dilution of the 1 pg/vl 
sample ; (data ndt ;shown). In this study the average length of the 
targets was 1 kb, with a few exceptions m the-range of ~300 bp, 
arrayed at a concentration of 1 /ig/^l* Normally one PGR pro- 
vided sufficient material to fabricate up to 1000 microarray targets. 
;:in considering positional effects in the development of* the 1 
targets for the niicroarrays, selection was biased toward the 3' 
proximal regions, ..because the signal was reduced if the target 
fragment was biased toward the 5' end (data hot shown). This 
result was anticipated since the hybridizing probe is prepared by 
reverse transcription with oligo(dT)-primed mRNA and is richer 
in y proximal sequences. Cross-hybridizations of probes to 
targets of ?a, gene family were analyzed with the matrix metal- 
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loproteinases as the example because they can show regions of 
sequence identities of greater than 70%. With collagenase-1 
(Col-1) and collagenase-2 (Col-2) genes as targets with up to 70% 
* sequence identity, and stromelysin-1 (Strom : l) and stromerysin-2 
(Strom-2) genes with different degrees of identity, our results 
showed that a short region of overlap, even with 70-90% se- 
quence identity, produced a low level of cross-hybridization. 
However, shorter regions of identity spread over the length of the 
target resulted in: cross-hybridization (data not shown). For 
closely related genes, targets were designed by avoiding long 
stretches of homology. For members of a gene family two or more 
target regions were included to discriminate between specificity 
of signal versus cross-hybridization. 

Monitoring Differential Expression in Cultured Cell Lines. In 
RA tissue, the monocyte/macrophage population plays a prom- 
inent role in phagocytic and immunomodulatory activities. Typ- 
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ically these cells, when triggered by an immunogen, produce the 
proinflammatroy cytokines TNF and IL-1. We .have used: the 
monocyte cell line MM6 and monitored changes in gene expres- 
sion upon activation with LPS endotoxin, a componentof Gram- 
negative bacterial membranes, and PMA, which augments ;.the 
action of, LPS.on TNF production (15). RNA was isolated at 
different times after induction and: used for cDNA probe prep- 
aration. From this time course it was clear that TNF expression' 
wasinduced within 15 min of treatment, reached maximum levels 
in 1 hr, remained high until 4 hr and, subsequently declined (Fig. 
24). Many .other cytokine :genes were also transiently activated, 
such as IL-la and-0, IL-6, and granulocyte colony-stimulating 
factor (GCSF). Prominent chemokines activated were IL-8,-mac- 
rophage inflammatory protein (MIP)-l/3, more so.than MIP-la, 
and«Groa or melanoma growth stimulatory factor. Migration 
inhibitory factor (MIF) expressed in the uninduced state declined 
in LPS-activated cells. Of the immediate early genes, the notice- 
able ones were c-fos,fra.l, c-jun, NF-KBp50, and IkB, with c-rel 
expression observed even in the uninduced state (Fig. IB). These 
expression patterns are consistent with reported patterns of 
activation of certain LPS-, and PMA-induced genes (12). Dem- 
onstrated here is the.unique ability of this system .to allow parallel 
visualization of a large number of gene activities over a period of 
time. 

SW1353 cells is a line derived from malignant tumors of the 
cartilage and behaves much like the chondrocytes upon stim- 
ulation with TNF and IL-1 in the expression of MMPs (9). In 
addition to confirming our earlier observations with Northern 
blots on Strom-1, Col-1, and Col-3 expression (9), gelatinase 
(Gel) A, putative metalloproteinase (PUMP)-l membrane- 




Fic. 3. Time course for IL-1/3 and TNF-induced SW1353 cells 
using the inflammation array (Fig. 1). (A) Pseudocolor representation 
of fluorescent scans correspond to gene expression levels at each time 
point. (Bl-IV) Relative levels of selected genes at different time points 
compared with time zero. 
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type - matrix metalloproteinase, tissue inhibitors of matrix 
metalloproteinases or tissue inhibitor: of metalloproteinase 1 
(TIMP-l), -2, and -3 were also expressed by these cells together 
with the human matrix metallo-elastase (HME; Fig. 3/1 ); HME 
induction was estimated to be .^50-foid and was greater than 
any of the; other MMPs' examined (Fig. 3B).:This result was 
unexpected because HME is reportedly expressed only :by 
alveolar macrophage and placental cells (16). Expression of 
the cytokines and chemokines, IL-6, IL-8, MIF, and MIP-10 
was also noted. . A variety of other genes, including certain 
transcription factors„were also up-regulated (Fig. 3),. but the 
overall time-dependent expression of genes in the SW1353 
cells was qualitatively. distinct from the MM6 cells. : i ' 
Quantitation of differential gene expression (Figs. IB and 
3B) was achieved with the simultaneous hybridization of 
Cy3-Iabeled cDNA from untreated cells and CyS^labeled 
cDNA from treated samples. The estimated increases in 
expression from these microarrays for a select number of genes 
including IL-lj3, IL-8, MIP-1/3, TNF, HME, Col-1, Col-3, 
Strom-1, and Strom-2 were compared with data collected from 
dot blot analysis. Results (not shown) were in close agreement 
and confirmed our earlier observations on the use of the 
microarray method for the quantitation of gene expression (3). 

Expression Profiles in Primary Chondrocytes and Synovia 
cytes of Human RA Tissue. Given the sensitivity and the 
specificity of this method, expression profiles of primary 
synoviocytes and chondrocytes from diseased tissue were 
examined: Without prior exposure to inducing agents, low level ' 
expression of c-juri, GCSF, IL-3, TNF-/3, MIF, and RANTES 
(regulated upon activation, normal T cell expressed and se- 
creted) was seen as well as expression of MMPs, GelA, 
Strom-1, Col-1, and the three TIMPs. In this case, Col-2 
hybridization was considered to be nonspecific because the 
second Col-2 target taken from the 3' end of the gene gave no . 

A. Human synovial fibroblasts B. Human articular chondrocytes 
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Fig. 4. Expression profiles for early passage primary synoviocytes and 
chondrocytes isolated from RA tissue, cultured in the presence of 10% 
fetal calf serum and activated with PMA and IL-1/3, or TNF and IL-10, 
or TGF-0 for 18 hr. The color bars provide a comparative calibration scale 
between arrays and are derived from the Arubidopsis mRNA samples that 
are introduced in equal amounts during probe preparation 
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signal. Treatment more so with PMA and IL^l, than TNF and 
IL-1,, produced a dramatic ^regulation in expression of 
several genes m both^these primary, cell types; These genes 
arenas follows: the cytokine IL-6, the .chemokines IL-8 and 
Gro : K and the MMPs; Strom-1, Col-1, Col-3, and HME; and 
^ a a ■ ° n moJeCule> vas cular- cell adhesion molecule 1 
(VCAM.l). The surprise again is HME expression, in these 
primary cells, for reasons discussed, above. From these results 
the; expression profiles of synoviocytes and the chondrocytes 
appear very similar; the differences are more quantitative than 
qualitative. Treatment; of the primary chondrocytes with the 
anabolic growth factor TGF-/3 had an interesting profile in that 
it produced a remarkable down-regulation of, genes expressed 
m both the untreated and induced state (Fig. 4). 

•Given the demonstrated effectiveness. of- this technology a 
comparative analysis- of two different inflammatory disease 
states was conducted with probes made from RA tissue and 
samples. RA samples were from late stage rheumatoid 
synovia tissue, and IBD specimens were obtained from in- 
flamed lower intestinal mucosa of patients with Crohn disease. 
With both the 96-elemenf known .gene microarray -and the 
1000-gene microarray of cDNAs selected from a peripheral ' 
human blood cell library (3), distinct differences- in gene ' 
expression patterns were evident. On -the 96-gene array R A 
tissue samples from different affected individuals gave similar 
profiles (data not shown) ,as did different samples from the 
same individual (Fig. 5). These patterns were notably similar 
to those observed with primary synoviocytes and chondrocytes 
(Mg. 4). Included in the list of prominently up-regulated eenes 
are IL-6, the MMPs Strom-1, Col-1, GelA, HME, and in. 
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mCMl f' E iP|; es ? ,on P rofiles °f RA tissue (^) and IBD tissue (B). 
St * R . 'f" 6 SampleS obtained f «»n 'he same individual was 
Kolated d.rectly after excision (RA 21.5A) or maintained in culture 
without serum for 2 hr (RA 21.5B) or for 6 hr (RA 21.5C). Profile! 

Sam f leS ° f u W ° ° ther individ ^k (data not shown) were 
remarkably similar to the ones shown here. IBD-A and IBD-CI are 
^"1 13 ^ f ai "Ples prepared directly after surgery from two sepa- 
rate individuals. For the IBD-CII probe, the tissue sample was cultured 
in medium without serum for 2 hr before mRNA preparation 
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S\ Sa "i P L eS !F MP ' TIMP *' P^icularly TIMP-1 and 
IIMP-3; and the adhesion.molecule VCAM. Discernible levels 

d A?^ phage chemotactic P rotein 1 (MCP-l), MIF and 
RANTES were also noted. IBD samples, were in comparison, 
rather subdued although ; -IL-r converting enzyme , (ICE), 
1IMP-1, and MIF were notable in all the three different IBD 
sampes examined here,, In IBD-A, one of, three individual 
samples, ICE, VCAM, Groo,. and MMP expression was more 
pronounced than in the others. . 

We also made use of a peripheral blood cDNA library (3) 
to identify genes expressed by lymphocytes.. infiltrating the 
inflamed, tissues from.the circulating blood." With the 1046- 
elernent array, of randomly selected cDNAs from this library 
probes made from RA and IBD samples showed hybridizations 
to a large number, of genes. Of these, many were common 
between the two disease tissues while others were differentially 
expressed (data not shown). A complete survey of these cenes 
was beyond the scope of this study, but for this report we 
picked three genes that were up-regulated in the RA tissue 
relative to IBD. These cDNAs were sequenced and identified 
by comparison to the GenBank database. They are TIMP-1 
S^SS? ° hain ' and manganese superoxide dismutase 
(MnSOD). Differential expression of MnSOD was only ob- 
served m samples of RA tissue explants maintained in growth 
medium without serum for anywhere between 2 to 16 hr These 
results also indicate .that the expression profile of genes can be 
altered when.explants are transferred to culture conditions. 

DISCUSSION 

The speed ease, and feasibility of simultaneously monitorine 
differential expression of hundreds of genes with the cDNA 
microarray based system (1-3) is demonstrated here in the 
analysis of a complex disease such as RA. Many different cell 
types in the RA tissue; macrophages, lymphocytes, plasma cells 
neutrophils synoviocytes, chondrocytes, etc. are-known to con- 
tribute to the development of the disease with the expression of 
gene products known to be proinflammatory. They include the 
cytokines chemokines, growth factors, MMPs, eicosanoids, and 
others (7, 11-14), and the design of the 96-element known gene 
microarray was based on this knowledge and depended on the 
availability of the genes. The technology was validated by con- 
firming earlier observations on the expression of TNF bv the 
monocyte cell line MM6, and of Col-1 and Col-3 expression in the 
chondrosarcoma cells and articular chondrocytes (9, 12) In our 
time^ependent survey the chronological order of gene activities 
m and between gene families was compared and the results have 
provided unprecedented profiles of the cytokines (TNF IL-1 
IL-6 GCSF, and MIF), chemokines (MIP-lo, MIP-1/3, IL-8 and 
Oro-1), certain transcription factors, and the matrix metal- 
loproteinases (GelA, Strom-1, Col-1, Col-3, HME) in the mac- 
rophage cell line MM6 and in the SW1353 chondrosarcoma cells 
fcarlier reports of cytokine production in the diseased state had 
established a model in which TNF is a major participant in RA 
Its expression reportedly preceded that of the other cytokines and 
effector molecules (4). Our results strongly support these results 
as demonstrated in the time course of the MM6 cells where TNF 

™'°41 preceded ,hat of IL - la a" d ^0 flowed by IL-6 and 
UCSK These expression profiles demonstrate the utility of the 

TTK«TT ing * e hierarach y of signaling events. 

Inthe SW1353 chondrosarcoma cells, all the known MMPs and 
lIMPs were examined simultaneously. HME expression was 
discovered, which previously had been observed in only the 
stromal cells and alveolar macrophages of smoker's lungs and in 
placental tissue. Its presence in cells of the RA tissue is mean- 
ingful because its activity can cause significant destruction of 
eiastin and basement membrane components (16, 17). Expression 
profiles of synovial fibroblasts and articular chondrocytes were 
remarkably similar and not too different from the SW1353 cells 
indicating that the fibroblast and the chondrocyte can play equally" 
aggressive roles in joint erosion. Prominent genes expressed were 
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the MMPs, but chemokines and cytokines were also produced by 
these cells. The effect of the anabolic growth factor TGF-0 was 
profoundly evident in demonstrating:the down regulation of these 
catabolic activities. : ' 

RA tissue samples .undeniably reflected profiles similar to 
the cell types examined. Active genes observed were IL-3, IL-6 
ICE, the<MMPs including HME and TIMPs,* chemokines IL-8' 
Groa; MIP, MIF v ;and RANTES, and the adhesion molecule 
VCAM. Of the growth factors, fibroblast growth factor 0 was 
observed most frequently.; In : comparison, : the expression 
patterns in the other inflammatory state (i.e., IBD)-were not 
as marked as in the RA samples, at least as obtained from the 
tissue samples selected for this study. ■ . 

As an alternative , approach, the 1046 cDNA rnicroarray of 
randomly selected genes from a lymphocyte library was.used to 
identify genes expressed in RA tissue (3). Many genes on this 
array hybridized with probes made from both RA and IBD tissue 
samples. The- results are not surprising because inflammatory 
tissue is abundantly supplied with cell .types infiltrating from the 
circulating blood, made apparent also by the high ; levels of 
- chemokine expression in R A tissue. Because of the magnitude of 
, the effort required to identify all the hybridized genes, we have for 
this report chosen to describe only three differentially expressed 
genes mainly to verify this method of analysis. 

Of the large number of genes observed here, a fair number 
were already known as active participants in inflammatory' dis- 
ease. These are TNF, IL-1, IL-6, IL-8, GCSF, RANTES, and 
VCAM. The novel participants not . previously reported are 
HME, IL-3, ICE, and Groa. With our discovery of HME 
expression in RA this gene becomes a target for drug interven- 
tion. ICE is a cysteine protease well known for its IL- 1)3 process- 
ing activity (18), and recognized for its role in apoptotic cell death 
(19). Its expression in RA tissue is intriguing. 11^3 is recognized 
for its growth-promoting activity in hematopoietic cell lineages, is 
a product of activated T cells (20), and its expression in synovio- 
cytes and chondrocytes of RA tissue is a novel observation. 

Like IL-8, Groa, is a C-X-C subgroup chemokine and is a 
potent neutrophil and basophil chemoattractant. It down- 
regulates the expression of types I and III interstitial collagens 
(21, 22) and is seen here produced by the MM6 cells, in primary 
synoviocytes, and in RA tissue. With the presence of RANTES 
MCP, and MIP-1J3, the C-C chemokines (23) migration and 
infiltration of monocytes, particularly T cells, into the tissue is 
also enhanced (5) and aid in the trafficking and recruitment of 
leukocytes into the RA tissue. Their activation, phagocytosis, 
degranulation, and respiratory bursts could be responsible for 
the induction of MnSOD in RA. MnSOD is also induced by 
TNF and IL-1 and serves a protective function against oxida- 
tive damage. The induction of the ferritin light chain encoding 
gene in this tissue may be for reasons similar to those for 
MnSOD. Ferritin is the major intracellular iron storage protein 
and it is responsive to intracellular oxidative stress and reactive 
oxygen intermediates generated during inflammation (24, 25). 
The active expression of TIMP-1 in RA tissue, as detected by 
the 1000-element array, is no surprise because our results have 
repeatedly shown TIMP-1 to be expressed in the constitutive 
and induced states of RA cells and tissues. 

The suitability of the cDNA microarray technology for 
profiling diseases and for identifying disease related genes is 
well documented here. This technology could provide new 



targets for drug development and disease therapies, and in 
doing so allow for improved treatment of chronic diseases that 
are challenging because of their complexity. < t . : , 
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Axttp saquehoe Wowing Ser 300 and occus within 
the domayio^Axjlp that shows hcxnotogy with NDE 
(74). To delete the complete STE23 sequence end 
craetethene23A^un43rnjtation, polymerase chain 
reaction (PGR) primers (S'-TCGGAAGACCTCAT- 
7CTTGCTCATTT7G4TArrGCTO TGTAGAT7G- 
TACTGAGAGTGCAC-3' ; and 5'-GCTACAAAGAGC- 
GTCGACT TGAATGCCCCGAC ATCT TCQACTGT- 
GCGGTATTTCACAOCG-3') were used to ampfify 
the URA3 sequence of pRS3l6, and the reaction 
product was fransformed Into yeast tor one-step gene 
replacement (R Rothstein, Methods Smyrna/. 194, 
261 (1991JJ. To create the ^J^^fc^rnutation con- 
tained on pi 14, a 5J0-kb SeJ I fragment from pAW.7 
was doned Ho pUCl9. and an interna* 4.04b Hpa 
Mho I fragment was replaced wttn a LBJ2 fragment. 
To construct the ste23A:±BJ2 aJtete (a deietion cor- 
responcing to 931 amho acids) earned on pi53, a 
LEUS fragment was used to replace the 2.84b Pml 
I-Ed136 1 tragrnent of S7E23, wnich occurs within a 
&24b Hfind »-Bg1 (I genomic fragment carried on 
pSP72 (Prornega). To create YEpMFAl, a 1.6-kb 
Bam HI fragment coritainjng MFA 1, from pKKl6 fK. 
Kuchter. R E. Sterne. J. Thomer, £MBOd 8, 3973 
(19891), was Sgatedirto the Bam HI sfteof>Ep3£l (J. 
E. HB, A. M. Myers. T.J. Koemer. A Tzagobfl, Yeast 
Z 163 (1986)]. 
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. SingteMener abbreviations for the amino acid resi- 
dues ere as toflows: A, AJa; C, Cys; D, Asp; E, GJu; F. 
Phe; G. Gfy; H, His; I. Be; K. Lys; L Leu: M. Met; N. 
Asn; P, Pro; Q. Gfri; R. Arg; S. Sen T. Thr; V, Vaf; W, 
Tm; and Y. Tyr. 
. A W303 1A derivative, SY2625 (M47e tra3-f iaj2-3. 
l12trp1-lao&-lcani-i00s$tlo.fTJta26:.-FUSl-tacZ 
his3&:&JS 7 was the parent strain tor the mutant 
search. SY2625 derivatives tor the mating assays, se- 
creted pheromone assays, and the putse-chase o^er- 
iments hduded tie totowtng strains: Y49 <sfa2?-7), 
Y115 (mte7A.-.lflJB), Y142 frtUXWl. Y173 
frdlblEUZl, Y220 1&1ZURA3 ste23fciJR*3\. Y221 
(jfa23^vUftA3). Y231 (BtfJA;.l£U2 ste23AU£U7}, 
end Y233 {sra23*±ajG). M47a derivatives of 
SY2625 included the Mowing strains: Y199 
(SY2625 made M4Ta), Y278 fsle22-r), Y195 
(mralA.-.ia/?). Y196 l/udlteLEUZ. and Y197 
(ax/T;;UA43). The EG 123 (MAT* totf ura3 trpl canl 
lis*) genetic background was used to create a set of 
strains for analysts of bud sto selection. EG 123 de- 
rVaUves included the folowing strains: Y175 
IpdlCL-LBJZ), Y223 t^'.vURAJ). Y234 tsto23A:; 
LBJZU and Y272 (atfJA;.t£lP SfeZSa-vta/g. 
A447o otonvatives of EG 123 included the tolowing 
strains: Y214 (EG 123 made M*7«) and Y293 
(ajtfIA.vLfl/2). AS strains were generated by means 
of standard genetic or molecular methods irrvorving 
the appropriate constructs (23). in particular, the ax// 
sfe23 double mutant strains were creeled by cross- 
ing of the appropriate M47a Ste23 and MATa ax/1 
mutants . (oUowed by sporutation d the resultant oip- 
toid and isolation of the double mutant from nonpe* 
rental cf-type tetrads. Gene dsrupUons were con* 
firmed with either PCR or Southern (DMA) analysis. 
31. pi 29 is a YEp352 (J. E Hi. A M. Myers, T. J. Ko- 
emer, A. Tzagotofl, Yeast 2. 1 63 (1986)) piasmid con- 
tainhga 5.5-kb Sal \ Iragrrwrt of C**Lf. P'51 was 
derived from pl29 by insertion of a Inker at the Bgll 
site within AXL 1, which ted to an h-frame insertion of 
the hemaggUlnln (HA) epitope (TXTrPYDVPOYA) (29) 
between amiio adds 854 and 855 of the AXL 7 prod- 



uct pC225 is a KS+ (Straiagene) piasmid contar*ig 
a 0^4* Bam H*^ I frag/nent frcm pAXL f . Sut^ 
tution rruatiens of the proposed acta ste of Aril p 
were created wRh the use of pC225 and sfte-specffc 
mutagenesis nvoMng apprrjeriate synthetic ofcgcnu- 
cteotides &fUH68A. S'-GTGCTCACAAAGCGCT* 
<5CCAAACCGGC-3'; axSUBTlA, 5'-AAGAATCAT- 
GTGCGCACAAAGGTGCGC-3'; andai/f^77D, 5'- 
AAGMTCATGTGATCACAAAGGTGCGC-3 7 . The 
'nutations were confirmed by sequence analysis. Af- 
ter nxitagenesis. the 0.4-kb Bam Hl-Msc I fragment 
from t he mutagenized pC225 ptasrrtds was trans- 
ferredhtopAXlJ to create a selo* pRS316ptesrrids 
carrying dittarent AXL I aAetes. pi 24 <flx1l-H68A) 
P130 t0xn-E7lA), and p132 (jufj-E7l 0). SrnJarly. a 
set of HA-tagged atoles earned on YEp352 were cre- 
Btedafter reptecement of the pl5l Bam W-Msc I 
fnip^TertLtoganeratepi61 &d1-E71A), pi62(axn- 
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Quantitative Monitoring of Gene Expression 
Patterns with a Complementary DNA Microarray 

Mark Schena,* Dari Sha!on f *t Ronald W. Davis, 
Patrick O. Browrtf 

A high-capacity system was developed to monitor the expression of many genes In 
parallel. M.croarrays prepared by high-speed robotic printing of complementary DNAs on 
glass were used for quantitative expression measurements of the corresponding, genes 

mt^£L 0f the L? mal1 format and hi9h density of tne a™* 3 - Hybridization volumes of 2 
microltters could be used that enabled detection of rare transcripts In probe mixtures 
derived from 2 micrograms of totaJ cellular messenger RNA. Differential expression 
measurements of 45 Ambidopsis genes were made by means of simultaneous, two-color 
fluorescence hybridization. 



29. 



30. 



The temporal, developmental, topographi- 
cal, histological, and physiological patterns 
in which a gene is expressed provide clues to 
its biological role. The large and expanding 
database of complementary DNA (cDNA) 
sequences from many organisms ( / ) presents 
the opportunity of defining these patterns at 
the level of the whole genome. 

For these studies, we used the small flow- 
ering plant Arobidopsii thalkma as a model 
organism, Arobidopsii possesses many ad- 
vantages for gene expression analysis, in- 
eluding the fact that it has the smallest 
genome of any higher eukaryote examined 
to date (2). Forty-five cloned Arobidopsu 
cDNAs (Table 1), including 14 complete 
sequences and 31 expressed sequence tags 
(ESTs), were used as gene-specific targets. 
We obtained the ESTs by selecting cDNA 
clones at random from an Arabidopsis 
cDNA library. Sequence analysis revealed 
that 28 of the 31 ESTs matched sequences 
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in the database (Table 1 ). Three additional 
cDNAs from other organisms served as con- 
trols in the experiments. 

The 48 cDNAs, averaging -1.0 kb. 
were amplified with the polymerase chain 
reaction (PCR) and deposited into indi- 
vidual wells of a 96-well microliter plate. 
Each sample was duplicated in two adja- 
cent wells to allow the reproducibility of 
the arraying and hybridization process to 
be tested. Samples from the microliter 
plate were printed onto glass microscope 
slides in an area measuring 3.5 mm by 5.5 
mm with the use of a high-speed arraying 
machine (3). The arrays were processed by 
chemical and heat treatment to attach the 
DNA sequences to the glass surface and 
denature them (3). Three arrays, printed 
in a single lot, were used for the experi- 
ments here. A single microtiter plate of 
PCR products provides sufficient material 
to print at least 500 arrays. 

Fluorescent probes were prepared from 
total Aroiwiopsts mRNA (4) by a single 
round of reverse transcription (5). The Ara. 
bidopsis mRNA was supplemented with hu- 
man acetylcholine receptor (AChR) mRNA 
at a dilution of 1 : 10,000 (w/w) before cDNA 
synthesis, to provide an internal standard for 
calibration (5). The resulting fluorescently 
labeled cDNA mixture was hybridized to an 
array at high stringency (6) and scanned 

467 



with a laser (3). A high-sensirivicy scan gave 
signals that saturated the detector at nearly 
all of the Arabidopsis target sites (Fig. I A). 
Calibration relative to the AChR mRNA 
standard (Fig. 1A) established a sensitivity 
limit of - 1 : 50,000. No detectable hybridiza- 
tion was observed to either the rat glucocor- 
ticoid receptor (Fig. 1A) or the yeast TRP4 
(Pig. 1A) targets even at the highest scan- 
ning sensitivity. A moderate-sensitivity scan 



£ Nigh sensitivity 

1 2 3 4 5 C 7 6 9 10 11 12 



of the same aitay allowed linear detection of 
the more abundant transcripts (Rg. IB). 
Quantitation of both scans revealed a range 
of expression levels spanning three orders of 
magnitude for the 45 genes tested (Table 2). 
RNA blots (7) for several genes (Fig. 2) 
corroborated the expression levels measured 
with the microarray'to within a factor of 5 
(Table 2). 

Differential gene expression was tnvesri- 



B Moderate sensitivity 

1 2 3 C 5 6 T 6 9 10 11 12 

a C C , ..^ 



h ^ C * ' w ' l ' ***' v ; | mm h 

*1:J,c6cf 1:10.000 IrSrjJoOO mSo^ 

Expression level ( w/w) 



f '■' C r i; O O 

g Q 0 C C . - - ; 

Maob 1 ''* 3 ^ -1 ™ ""i:J3oO 1:10000 



q Wild type 

1 2 3 4 £ c r 6 9 10 11 12 



to .* • v 

o o - c* q o c ; • «■ 

■:• ;■ d ■ 

e C O -. v 

g i? ~ o O 



P KAr-J transgenic 

1 2 3 0 5 C T 6 9 10 It 12 



E ' Root tissue 

12 34567 69 10 11 12 



F Leaf tissue 

1 2 3 4 5 6 T 6 9 10 11 12 



gated with a simultaneous, two-color hy- 
bridUation scheme, which served to mini- 
miie experimental variation inherent in the 
comparison of independent hybridizations. 
Fluorescent probes were prepared from two 
mRNA sources with the use of reverse tran- 
scriptase in the presence of fluorescein- and 
lissamine-labeled nucleotide analogs, re- 
spectively (5). The two probes were then 
mixed together in equal proportions, hy- . 
bridized to a single array, and scanned sep- 
arately for fluorescein and lissamine emis- 
sion after independent excitation of the two 
fluorophores (3). 

To test whether overexpression of a sin- 
gle gene could be detected in a pool of total 
Arabidopsis mRNA, we used a microarray to 
analyze a transgenic line overexpressing the 
single transcription factor HAT4 (8). Fluo- 
rescent probes representing mRNA from 
wild-type and HAT4 -transgenic plants were 
labeled with fluorescein and lissamine, re- 
spectively; the two probes were then mixed 
and hybridized to a single array. An intense 
hybridization signal was observed at the 
position of the HAT4 cDNA in the lissa- 
mine-specific scan (Fig. ID), but not in the 
fluorescein-specific scan of the same array 
(Fig. 1C). Calibration with AChR mRNA 
added to the fluorescein and lissamine 
cDNA synthesis reactions at dilutions of 
1:10,000 (Fig. 1C) and 1:100 (Fig. ID), 
respectively, revealed a 50-fold elevation of 
HAT4 mRNA in the transgenic line rela- 
tive to its abundance in wild-rype plants 
(Table 2). This magnitude of HAT4 over- 
expression matched that inferred from the 
Northern (RNA) analysis within a factor of 
2 (Fig. 2 and Table 2). Expression of all the 
other genes monitored on the array differed 
by less than a factor of 5 between HAT4- 
transgenic and wild-type plants (Fig 1, C 
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Fig. 1. Gene expression monitored with the use of cDNA microarrays. Ruoresoant scans represented in 
pseudxokx correspond to hybridization densities. Color bars were calibrated from the signal obtained 
wrththeuseofkrx>wncc*x^rati^ experiments Numbers end 

letters on the axes mark the position of each cDNA (A) High-sensitivity nuorescein scan after hybridization 
with ftuorescein-labeied cDNA derived from wild-type plants. (B) Same array as in (A) but scanned at 
moderate sensttMty. (C and D) A single array was probed with a 1 : 1 mixture of fluorescetvlabeled cONA 
from vvfld-type plants and bssarnne-iabeted cDNA from HAT4-transgenic plants. The single array was 
then scanned successively to detect the fluorescein fluorescence corresponding to mRNA from wild- type 
plants (Q and the lissamine fluorescence corresponding to mRNA from HAT4 -transgenic plants (D) (E 
and F) A single array was probed withe 1:1 mixture of rwccesoein-labeled cDNA from root tissue and 
lissamine-tebeted cOMA from leaf tissue. The single array was then scanned successively to detect the 
fluorescein florescence cc^esporxSng to mRNAs expressed in roots (E) and the fissamine fluorescence 
corresponding to mRNAs expressed in leaves (F). 
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Rg. 2. Gene expression monitored with RNA 
(Northemrblot eneiysis. Designated amounts of 
mRNA from wild-type and HA 74 .transgenic 
plants were spotted onto nyton membranes and 
probed with the cONAs indicated. Purified human 
AChR mRNA was used for calibration. - 
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and D, and Tabic 2). Hybridisation of flu- 
oresce in-labeled glucocorticoid receptor 
cDNA (Fig. 1C) and lissamine-labeled 
TRP4 cDNA (Fig. ID) verified the pres. 
ence of the negative control targets and the 
lack of optical cross talk between the two 
fluorophores. 

To explore a more complex alteration in 
expression patterns, we performed a second 
two-color hybridization experiment with 
fluorescein- and I issa mine-labeled probes 
prepared from root and leaf mRNA, respec- 
tively. The scanning sensitivities for the 
two fluorophores were normalized by 
matching the signals resulting from AChR 



mRNA, which was added to both cDNA 
synthesis reactions at a dilution of 1:1000 
(Fig. 1, E and F). A comparison of the scans 
revealed widespread differences in gene ex- 
pression between root and leaf tissue (Fig. 1, 
E and F). The mRNA from the light-regu- 
lated CAB] gene was -500-fold more abun- 
dant in leaf (Fig. IF) than in root tissue 
(Fig. IE). The expression of 26 other genes 
differed between root and leaf tissue by 
more than a factor of 5 (Fig. l f E and F). 

The HAT4-transgenic line we examined 
has elongated hypocoryls, early flowering, 
poor germination, and altered pigmentation 
(o). Although changes in expression were 



Table 1. Sequences contained on the cONA microarrav Shown « the r^w^ ^ . 

m tr»s study matched a sequence h the database NADH r«*Vori L« „ 



Position 



cONA 



Function 



Accession 
number 



.12 



81.2 

ad, 4 
a5.6 
87,8 
a9, 10 
a11, 12 
bl.2 
b3. 4 
b5.6 
b7.8 
b9. 10 
011, 12 
cl.2 
c3,4 
C5.6 
c7,8 
C9. 10 
C11, 12 
d1.2 
d3.4 
d5, 6 
d7,6 
d9, 10 
d11 
61.2 
e3,4 
e5, 6 
e7.8 
e9. 10 
ell, 12 

II. 2 
t3,4 
15,6 
f7,8 
f9, 10 

III. 12 
91.2 
Q3.4 
95,6 
07,8 

g9. 10 

911. 12 
M,2 
h3,4 
h5,6 
h7,8 
h9,10 
Ml. 12 



AChR 
EST3 
EST6 
AAC1 
EST 12 
EST13 
CABJ 
EST17 
GA4 
EST19 
GBF-1 
EST23 
EST29 
GBF-2 
EST34 
ES735 
EST41 
rGR 
EST42 
EST45 
HAT1 
EST46 
EST49 
HAT2 
HAT 4 
EST50 
HA75 
EST51 
HAT22 
EST52 
EST59 
KNAT1 
EST60 
EST69 
PPH1 
EST 70 
EST 75 
EST 78 
ROC1 
EST82 
EST83 
EST84 
EST91 
EST96 
SARI 
EST100 
EST 103 
7EP4 



Human AChR 
Actin 

NADH dehydrogenase 
Actin 1 
Unknown 
Actin 

Chlorophyll a/b binding 
Phosphog»ycerate kinase 
Gtabereflic acid biosynthesis 
Unknown 

G-box binding factor 1 
Bongaton factor 
Aldolase 

G-box binding factor 2 
Chtoroplast protease 
Unknown 
Catalase 

Rat glucocorticoid receptor 
Unknown 
ATPase 

Homeobox -leucine zipper 1 
Light harvesting complex 
Unknown 

Homeobox -leucine zipper 2 
. Homeobox-teucine zipper 4 
Phosphoributokinase 
Homeobox-leucine zipper 5 
Uiknown 

Homeobox -leucine zipper 22 
Oxygen evorving 
Unknown 

X/wfred-like homeobox 1 
RuBisCO small subunit 
Translation elongation factor 
Protein phosphatase 1 
Unknown 

Chtoroplast protease 
Unknown 
Cydophftin 
GTP binding 
Unknown 
Unknown 
Unknown 
Unknown 
Synaptobrevin 
Light harvesting complex 
Light harvesting complex 

Y& ast tryptophan biosynthesis 

'Proprietary sequence of Strstagere (U Jciia. Cafitonia). 



H36236 

227010 

M20016 

U36594T 

T45783 

M85150 

T44490 

L37126 

U36595t 

X63894 

X52256 

T04477 

X63895 

R87034 

T14152 

T22720 

M14053 

U36596t 

J04185 

U09332 

TD4063 

T76267 

U09335 

M90394 

T04344 

M90416 

233675 

U09336 

T21749 

234607 

U14174 

X14564 

T42799 

U34803 

T44621 

T43698 

R65481 

L14844 

X59152 

233795 

T45278 

T13832 

R64816 

M90418 

218205 

X03909 

X04273 



observed for HAT4, large chances in ex- 
pression were not observed for any of the 
other 44 genes we examined. This was 
somewhat surprising, particularly because 
comparative analysis of leaf and root tissue 
identified 27 differentially expressed genes. 
Analysis of an expanded set of genes may be 
required to identify genes whose expression 
changes upon HAT4 overexpression; alter- 
natively, a comparison of mRNA popula- 
tions from specific tissues of wild-type and 
HAT4-transgenic plants may allow identi- 
fication of downstream genes. 

At the current density of robotic printing 
it is feasible to scale up the fabrication pro-' 
C 5?,J° produce arrays containing 20,000 
cDNA targets. At this density, a single array 
would be sufficient to provide gene-specific 
targets encompassing nearly the entire rep- 
ertoire of expressed genes in the Arabidopsis 
genome (2). The availability of 20,274 ESTs 
from Arabidopsis (J, 9) would provide a rich 
source of templates for such studies. 

The estimated 100,000 genes in the hu- 
man genome (10) exceeds the number of 
Arabidopsis genes by a factor of 5 (2). This 
modest increase in complexity suggests that 
similar cDNA microarrays, prepared from 
the rapidly growing repertoire of human 
ESTs (I), could be used to determine the 
expression patterns of tens of thousands of 
human genes in diverse cell types. Coupling 
an amplification strategy to the reverse 
transcription reaction (Ji) could make it 
feasible to monitor expression even in 
minute tissue samples. A wide variety of 
acute and chronic physiological and patho- 
logical conditions mighr lead to character- 
istic changes in the patterns of gene expres- 
sion in peripheral blood cells or other easily 
sampled tissues. In concert with cDNA mi- 
croarrays for monitoring complex expres- 
sion patterns, these tissues might therefore 
serve as sensitive in vivo sensors for clinical 
diagnosis. Microarrays of cDNAs could thus 
provide a useful link between human gene 
sequences and clinical medicine. 



Ta ble 2. Gene expression rrorttorirxj by rniaoar. 
ray and RNA blot analyses; tg. HAT^transoenic 
See Table 1 f or additional gene WormatioaE^-' 
pression tevets (w/w) were calibrated with the use 
of known amounts of human AChR mRNA. Values 
for the mcroarrey were oetermined from microa/- 
ray scans (Fig. 1); values for the RNA blot were 
determined from RNA btots (Rg 2) 



Gene 



Expression level (wAv) 



TNo match in the database; rwel EST. 



CAW 
CABf (tg) 
HAT 4 
HAT4 (tg) 
ROC1 
ROCl (tg) 



Microarray 


RNA blot 


1:48 


1:83 


1:120 


1:150 


1:8300 


1:6300 


1:150 


1510 


1:1200 


1:1800 


1560 


1:1300 
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Gene Therapy in Peripheral Blood 
Lymphocytes and Bone Marrow for 
ADA Immunodeficient Patients 

Claudio BordignorV Luigi D. Notarangelo, Nadia Nobili, 
Giuliana Ferrari, Giulia Casorati, Paola Panina, Evelina Mazzolari, 
Daniela Maggioni, Claudia Rossi, Paolo Servida, 
Alberto G. Ugazio, Fulvio Mavilio 

Adenosine deaminase (ADA) deficiency results in severe combined immunodeficiency 
the first genetic disorder treated by gene therapy. Two different retroviral vectors were 
used to transfer ex vivo the human ADA minigene into bone marrow cells and peripheral 
blood lymphocytes from two patients undergoing exogenous enzyme replacement ther- 
apy. After 2 years of treatment, long-term survival of T and B h/mphocytes, marrow cells 
and granulocytes expressing the transferred ADA gene was demonstrated and resulted 
in normalization of the immune repertoire and restoration of cellular and humoral immunity 
After discontinuation of treatment, T lymphocytes, derived from transduced peripheral 
blood lymphocytes, were progressively replaced by marrow-derived T cells in both pa- 
tients. These results indicate successful gene transfer into long-lasting progenitor cells, 
producing a functional multilineage progeny. 



Severe combined immunodeficiency asso- 
ciated with inherited deficiency of ADA 
(J) is usually fatal unless affected children 
are kept in protective isolation or the im- 
mune system is reconstituted by bone mar- 
row transplantation from a human leuko- 
cyte antigen (HLAMdentical sibling donor 
(2). This is the therapy of choice, although 
it is available only for a minority of patients. 
In recent years, other forms of therapy have 
been developed, including transplants from 
haploidentical donors (3, 4), exogenous en- 
zyme replacement (5), and somatic-cell 
gene therapy (6-9). 

We previously reported a preclinical mod- 
el in which ADA gene transfer and expression 

C. Bortfgnon. N. Nobll, G. Ferrari. D. Maggioni. C. Rossi. 
P. Servida, F. Mavilo. Telethon Gene Therapy Program 
tor Genetic Diseases. DIBIT, tstrtuto Scientifico H. S. Ral- 
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P. Panina, Roche Mflano Ricerche. Milan. Italy. 
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successfully restored immune functions in hu- 
man ADA-deficient (ADA") peripheral 
blood lymphocytes (PBLs) in irnmunodefi- 
cient mice in vivo (JO, J J J. On the basis of 
these preclinical results, the clinical applica- 
tion of gene therapy for the treatment of 
ADA" SCID (severe combined irniruinodefi- 
ciency disease) patients who previously railed 
exogenous enzyme replacement therapy was 
approved by our Institutional Ethical Com- 
mittees and by the Italian National Commit- 
tee for Bioethics (12). In addition to evaluat- 
ing the safety and efficacy of the gene therapy 
procedure, the aim of the study was to define 
. the relative role of PBLs and hematopoietic 
stem cells in the long-term reconstitution of 
immune functions after retroviral vector-me- 
diated ADA gene transfer. For this purpose, 
two structurally identical vectors expressing 
the human ADA complementary DNA 
(cDNA), distinguishable by the presence of 
alternative restriction sites in a nonfunctional 
region of the viral long-terminal repeat 
(LTR), were used to transduce PBLs and bone 
marrow (BM) cells independently. This pro- 
cedure allowed identification of the origin of 





Attachment 4 of 1 1 /I 
InUSSN: 09/918,624 
PA-0033 US 



TI 



17 
01- 



Wl 

NO. 



CEWETIC 

~ raffia 1 



OE281M 

SEo: CO457S000 1 
BMOlNBEfilHC HEWS 



ENGINEERING 

NEWS 



• BIOPROCESS 



Pharmagene 
Raises More 
Capital for 
Research on 
Human 
Tissues 

By Sophia Fox 

'TWhannageae. the Royston, 
B^UIC^Mscd biopharmaceuti- 
JL cal company specialising in 
Aid use of human btonmtenals for 
dnig discovery rcsearcK has raised a 
former £5 million from a group of 
investors led by 3i and Abacus 
Nominees. The - funding will enable 
the co m pany to expand both its 
human tnomatenals collection and 
its capabilities across a range of pro- 
prietary platform technologies. 

Gordon Baxter, Ph.D., 
Prtarmagenes cofounder and chief 
operating officer, claimed Tjy the 
end of this year Pharmagene will 
have access to the largest collection 
of human RNAs and protems any- 
where in the world and a range of 
innovative, yet robust technologies 
SEE PHARMAGENE, P. 9 
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Perkin-Elmer Acquires PerSeptive to Expand 
Its Capabilities in Gene-Basea Drug Discovery 



' By John Sterflng 

Pertdn-Elmer* (PE; Norwalk. 
CO decision last month to 
acquire PtrSeptfve Bto- 
' systems (Framtnghanv MA) via a 
$360 million' stock swap was 
designed to strengthen PE in terms 
of broad capabilities in gene-based 
drug discovery. The company^ 
main goal is to develop new prod- 
ucts to improve the integration of 
genetic and protein research. 

"Iras merger will enhance our 
position as an effective provider of 
innovative, integrated platforms 
enabling our customers to be more 
efficient and cost-cfTective in bring- 
ing new phaiinacciiticals to mar- 
ket" says Tony L. White, PE* 
chairman, president and CEO. 'The 
combination of our two companies 
should bolster our presence in the 
life sciences, [and it is our] belief 
that we must take bold action now 
to lead the emerging era of molecu- 
lar medicine with leading positions 
in both genetic and protein analy- 
sis." 

A driving force behind the 
merger is the vast amount of genet- 



FDA OKs Genzyme's Carried 
Product for Damage to Knees 



/SL — n 

fM \> — ■ Defect 



- Periosteal flap 




Gnnzyme TitMiri Rrpai 



Cctt Processing 



Carried, which was approved for the repair of clinically significant, symp- 
tomatic cartilaginous defects of the femoral condyle (medial lateral or 
trochlear) caused by acme or repetitive trauma, employs a proprietary 
process to grow autologous cartilage edit far implantation. 



By Naomi PfrffTer 

The FDA has approved a knee- 
cartitagc replacement product 
made by Cerayme Tissue 
Repair (Cambridge. MA), a track- 
ing-stock division of Genzyme 
Corpv, for people with trauma- 
damaged knees. 

Carticer (autologous cultured 
chondrocytes) is the first product to 
be licensed under the FDAs pro- 
SSE GENZYME, P. 6 
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PeHdn- 
Elmer ' 
acquired 
PerSeptive 
Biosystems 
for$360 
million to 
obtain new 
technologies 

spectrxme* 
try. biosepa- 
tations and, . 
purification 
for product 



jpro/ecKr, 
spanning the 
range from 
genomics to 
prof convex. 



ic information about human dis- 
ease that is being accumulated by 
researchers and biotcch companies 
working in the area of genomics. It 
is becoming increasingly obvious 
that these data need to be comple- 
mented with technologies for 



studying protems and protein net- 
works -a field known as pro* 
tcomics {see GEN. September /. 
1997. p.l\ 

PE officials, who claim that 
MALDI-TOF (Matrix Assisted 

SEE ACOUSmON. P. 10 



Strategies for Target Validation 
Streamline Evaluation of Leads 



By VkkJ Glaser 

carta Biosciences (Rich- 
mond CA) last month 
its first agree- 
ment with a major pharmaceutical 
company, signing a deal with ED 
UBy (Indianapolis, IN) to use 
Acacias Genome Reporter Matrix 
(GRM) to select and optimize some 
of Lillys lead awnpounds. Acacias 
yeast-based system for profiling 
drug activity is useful for evaluating 
the therapeutic potential of lead 
co mpo unds, and it also has a role in 
the identification and validation of 
new drug targets. 

"Wfere using the ecosystem of a 
cell to allow us to deduce the mech- 
anism of action and target for any 
chemical," explains Bruce Cohen, 
president and CEO. "We screen for 
every target in a cell simultaneous- 
ty...using transcription as a readout 



for how a cell is adapting to any 
perturbation " he says. 

The GRM technology consists of 
two main databases: one is the 
genetic response profile, showing 
the effects of imitations in each, 
individual yeast gene and compen- 
satory gene regulatory mecha- 
nisms; the other is the chemical 
response profile, which documents 
changes in gene expression in 
response to chemical compounds. 
Coninutational analysis and naticm 
matching between the genetic and 
chemical profiles yields informa- 
tion on the specificity, potency and 
side-effects nsk of a drug lead 

Targeting Tirgett 

No longer is mapping and 
sequencing a gene— -or the human 
genome — on end unto ttself, but 
SEE TAR&ET, P. 16 



Sticky Ends 



Avigen received two 
grants from the NIH & 
University of Cali- 
fornia for research 
on gene therapy for 
treatment of cancer a 
HIV infections. . .KRL 
Pharmaceutical Servi- 
ces, of Reston, VA, 
launched the TSN Bug 
Finder, which is able 
to locate a retrieve 
client -specified mi- 
croorganisms in real- 
time . . .Oensla Sicor, 
Inc. will move its 
corporate staff from 
San Diego to Irvine, 
CA, by end of year... 



FDA accepted HDA from 
Sepracor for levalbu- 
terol HC1 inhalation 
solution. . .An $11. 7M 
mezzanine financing 
has been closed by 
Activated Cell Thera- 
py, which changed its 
name to Dendreon Cor- 
poration. . .Astra AB 
will build major re- 
search facility in 
Halt ham, HA, and is 
also relocating Astra 
Xrcua research facil- 
ity from Rochester to 
Boston area. . .Prolif- 
ic Ltd. team used a 
smell peptide to in- 
hibit the E2F protein 
complex and induced 



apoptosia in mammali- 
an tumor cells... Var* 
tax Pharmaceuticals , 
lac. and Alpha Thera- 
peutic Corp. ended an 
agreement to develop 
VX-366 for treatment 
of inherited hemoglo- 
bin disorders Mavl- 

Cyte received Phase I 
SBIR grant for up to 
$100,000 from NIH for 
development of proto- 
type of its NavlPlow 
technology for high- 
throughput screening 
. . .Co vane a Inc. will 
invest $21 million in 
expansion and renova- 
tion of its facility 
in Indianapolis, ZN. 
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Target 



' merely a means to an end. The criti- 

• cal next step is to validate the gene 
and its protein product as a potential 
-drug target. The Human Genome 

Project continues to produce a trca- 
: sure chest of expressed sequence 
, tags ( EST*) and a tantalizing array of 
. complete gene sequences. • 

Companies arc applying a vairicty 
( of functional genomic strategies' to 
link genes to specific diseases and to 
multigcnic rihcnotypcs. Yet the ulti- 
mate challenge for pruirrnaceutkal 
.companies is to sift through all the 
..sequence and differential gene 
expression data to identify the best 
targets for drug tfiscostry. 
, Spinning off technology devel- 
t oped at the University of North 
Carolina (Chapel Hill), Oytogen 
Corp. (Princeton. NJ) formed its 
wholly owned subsidiary AiCetl 

• Biosciences earlier this year. The 
young company is building a protein 
interaction database, cataloging all 
the interactions the modular domains 
of proteins can engage in with a 



range of ligands, in order to gain 
' tnstght rhto protein fund ion and to 
select die most critical. inter jctkxi to 
1 target for drug development. 
: - AxCcfls clomrtg^f-Ugind-targcts 
(COLT) technology employs "recog- 
nition units'* . from the company's 
genetic drversfry library (GDL) to 
map functional protein interactions 
and quantitate their affinity. The 
company s imcr-functional protcom- 
k database (IFP-tfxuc) elucidates 
protein interaction networks and 
•structure-activity relationships based 
on ligand affiniry with protein mod- 
ular domains. 

Defining Disease Pathways 

' Signal Pharmaceuticals, Inc.** 
(San Diego, CA) mtegrated drug tar- 
get and discovery effort is based on 
mappnig gene-regulating pathways in 
cells and identifying small molecules 
that regulate the activation of those 
genes. In collaboration with academ- 
ic re scarel iers; the company has iden- 
tificd a targe number of regulatory 
proteins in several mitogerwictivated 
protein (MAP) kinase ' pathways * 
(including the JNK. FRX and pJ8 
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signaling pathways), which Signal is 
evaluating for the treatment of 
autoimmune, mfhrnmatory. . cardio- 
vascular and neurologic diseases, and 
cancer. Other target identification 



programs' focus on the NrMcB path- 
way, estrogen-related genes and cctv 
rrafyeripheral nervous system genes. 

Regulating cytokine production in 
immune and inflam ma tory disorders. 




A strong chemical combination to help you grow And flourish. 

Three hundred million dollars and ten years of hard work. That's what it costs to bring your hiotechno!o$'- 
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and modifying bone metabolism to 
treat osteoporosis arc die focus of 
Signals collaboration with Tanabr 
Seryaku (Osaka. Japan I Signal has 
partnered with Orgsnon/Akzo 
Nobel (Netherlands) to identify 
carogen- responsive genes as targets 
for treating neurod ege ner ati ve and 
psychiatric diseases, atherosclerosis 
and ischemia, and with Roche 
Bioscience (Palo Alto. CA) to dcvcl- 
rip human peripheral nerve cdl lines 
for the discovery of treatments for 
pain and incontinence. , . 

ExeHxb' (S. San Francisco. CA) 
strategy for target selection is to 
define disease pathways and identity 
regulatory molecules that activate or 
inhibit those biochemical/genetic 
pathways. Based on the finding that 
-these pathways are conserved across 
species, the company is studying the 
model genetic systems of Orosophila 
and Caenorhabdltbt elegant. Using 
its Path Finder technology, Exelixis 
systematically introduces mutations 
into -the genomes of these model 
organisms, looking for mutations 
that enhance or suppress the target 
disease-related gene. These novel 
genes then become the basis of drug 
screening assays. 

Cadus Pharmaceutical Corp. 
(Tarrytown. NY) is identifying sur- 
rogate ligands to newly discovered 
orphan G-protein coupled trans- 
membrane receptors of unknown 
function to determine the suitability 
of the receptors' as drug targets. 
Inserting the novel receptor m a 
yeast system yields a ligand that 
activates the receptor. Access to a 
surrogate ligand allows the company 
to screen for receptor antagonists in 
the yeast system. 

"The antagonist plus the surro- 
gate ligand gives you two probes— 
an on probe and an off probe— 
which allows you to look at func- 
tion," explains David Wfcbb, Ph.D.. 
vp of research and chief scientific 
officer. A surrogate hgand also pro- 
vides information on which G-pro- 
tein interacts with the orphan recep- 
tor and its associated signaling path- 
ways, further clarifying the role of 
the receptor as a potential drug tar- 
get. Cadus 1 collaboration with 
SmithKlIne (Philadelphia) capital- 
izes on Cadus* ability to determine ■ 
orphan receptor function, applying 
the technology to SmtthKline* i pro- 1 
prictary. newly discovered G-pro- 
tetn receptors. 

Cadus 1 recombinant yeast system ■ 
can also be used to screen cell and 
tissue extracts for natural ligands. 
ami the cimtpnny is accelerating hs 
internal drug-discovery effort* in the 
areas of cancer, inflammation and ' 
allergy. A recent equity investme nt in 
Axiom Biotechnologies (San Diego, 
CA) gave Cadus a license to Axioms 
high-throughput pharmacologic - 
screening system for lead optimiza- 
tion and discovery. 

As its name implies, 
gene/Networks (Alameda, CA) 
focuses on identifying $enc networks 
that contribute to multigcnic nttcno- 
types and complex disease process- 
es. The integration of mouse and 
human genetic studies forms the 
basis of the technology. The Genome 
Tagged Mice database in develop- 
ment will serve as a library of natur- 
al mouse genetic and ph enotyp ic 
variation. Disease-related c cnes 
identified in mice are then evaluated 
in human family- and population- 
based studies to confirm their clini- 
cal relevance and linkages to pathn- 
physiologic traits. 

Blocking Gene Expression 

Inactivating a gene known to be 
expressed in association with a par- 
ticular disease is one approach to 
identifying appropriate therapeutic 
targets. The target validation and dis- 
covery program at Rlbozymc 
Pharmaceuticals, loe. (Boulder. 
CO) applies the company's rtho/ymc 
lechnoliyy la ;tchicvc selective inhi- 
bit urn of gene expression in cell cul- 
ture and in animals. 

C orrelation of the gene cxprcs- 
siim inhibition with pbertorype «m 
SEE TARGET, P. 38 
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suggest the relative importance of 
the gene in disease pathology. The 
company* nuclcase-resistant 
ribozymcs form the basis of a col- 
laboration with Sobering AG 
(Germany) for drug target validation 
and the development of ribozyme- 
based therapeutic agents, and with 
Chiron Corp. (Emeryville, CA) for 
target validation. 

With several antisense compounds 
now progressing through clinical tri- 
als, the concept of using oligonu- 
cleotides to inhibit gene activity is 
not new. But rather than focusing on 
therapeutics development, Sequjtur; 
Inc. (Natick, MA) is creating anti- 
sense compounds for the purpose of 
determining gene function and vali- 
dating drug targets. Clients typically 
provide the one-year-old company 
with the sequence (or EST) of a 
potential gene target and, in return, 
Sequhur custom designs a series of 
three to six antisense compounds that 
yield a three-to-ten-fold inhibition of 
the target gene in cell culture. The 
company also provides oligofectins, 
a series of canonic lipids, to deliver 
the oligonucleotides to a variety of 
cultured cells. 

4 1>ifferential expression informa- 
tion is just for correlation, it doesn't 
tell function or confirm what would 
be a good target" says Tod Woo If. 
PhX)., director of technology devel- 
opment at Sequitur. Whereas, anti- 
sense compounds will inhibit a tar- 
get Sequitur offers both phospho- 
rothioate DNA antisense com- 
pounds, and its proprietary Next 
Generation chimeric oligonu- 
cleotides, which have a higher 
hybridization affinity, greater speci- 
ficity and reduced toxicity, according 
to the company. 

Mining Pathogen Genomes 

Companies such as Human 
Grume Sciences (HQS; Rockvillc. 
MD), Incyte (Palo Alto, CA). 




AxCell Biosciences scientists say their technology enables the rapid and 
simple junctional identification of the two essential molecular components 
of protein interaction networks: specific recognition units that bind distinct 
modular protein domains are identified and isolated using a combination 
structuml/pjnctionalappwa^ 

Diversity Libraries (GDL) and bioinformatics, and cloning of Ligand 
Targets (COLT) technology utilizes recognition units as Junctional probes to 
isolate families of mtemctor proteins. 



Millennium Pharmaceuticals Inc. 
(Cambridge, MA) and Genome 
Therapeutics (Waltham, MA) are 
relying on high-speed DNA sequenc- 
ing, positional cloning and other 
strategies to identify specific micro- 
bial genomic sites that would be 
good targets for infectious disease 
therapeutics. 

HGS recently completed sequenc- 
ing of the bacterial pathogen 
Streptococcus pneumoniae, which is 
the focus of an agreement with 
Hoffmann-La Roche (Basel, 
Switzerland). Roche will use the 
sequence data to develop new anti- 
infectives against S. pneumoniae. 
HGS and Roche have expanded their 
collaboration to include a nonexclu- 
sive license to access sequence infor- 
mation for the intestinal bacterium 
Enterococcus foecalis. 

Incyte Pharmaceuticals has com- 
pleted one- fold coverage of the 
Candida albicans genome, identify- 



ing 60% of the genes of this fungal 
pathogen. This genome will become 
part of the company's PathoSeq 
microbial database. Incyte recently 
introduced the ZooSeq animal gene 
sequence and expression 
The database will provide 
m formation across variotr 
commonly used in preclinical drug 
testing, which may help to better 
define potential drug targets. 

Millennium Pharmaceuticals con- 
tinues to report success in iclennfying 
novel drug targets, having recently 
discovered a novel crtemoJtine called 
netirotactin and a new class of MAD- 
related proteins that inhibit trans- 
forming growth factor beta (TGF-B) 
signaling. The company also 
received US. patent coverage for the 
tub genes, believed to play a role in 
obesity, and for the £enc that encodes 
the protein mctastann, which appears 
to suppress metastasis in malignant 
melanoma. ■ 




HIGH SPECIFIC ACTIVITY 
MICROBIAL ALKALINE 
PHOSPHATASE 
from Biocatalysts 

Biocatatysts Limited, the British speciality enzyme 
company, has developed a completely new type of 
alkaline phosphatase with marry advantages over the 
types most commonly used. 
It fs of microbial origin with a high specific activity 
(unlike that from E coB) and with higher temperature and 
storage stability compared to that from calf intestine. 
This is the first of several new generation diagnostic 
enzymes being developed by Biocatalysts Limited with 
greatly Improved stability. 

• Non-animal source, no risk of BSE or animal 
virus contamination 

• Higher temperature stability than calf Intestine 

• Much higher specific activity than from E. cell 

• Very high storage stability even la the absence 
of glycerol 

For further detaMs on Blkaiine phospharaso and our other 
diagnostic enzymes contact us direct at the address bttow or 
within North America contact our US Distributor KaJtrorhPettiDorts 
•phone: 630350 11 16 or tax: 630-350-160$ 
Blocsurvttt Umtted 

Trtforttt btdostrtil Estate PtotyprMd Walts OK CFI7 5UD 
Ttt +44 (0)1443 S4S712 Fax: +44 (0)1444 §41X14 
•<nlhUQf®UocMlMiy%Uxom. 
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Smith, now a computer program- 
mer,, ts an expert in systems integra- 
tion, Internet technologies and the 
application of industrial engineering 
principles to the drug discovery 
process. Before co-fwmding Pangea, 
he was the manager of software 
development at Attorney* Briefcase, 
a legal research software company. 

By being "in the trenches" with 
customers and collaborators, 
Bclknson and Smith sensed the 
frustration of pharmaceutical 
researchers whose incompatible 
tools have impeded their progress. 
According to Bellenson, "Most of 
them are geared toward analyzing 
one molecule at a time. It s tike emp- 
tying the ocean with an eye drop- 
per — an incompatible eye dropper ai 
that A pharmaceutical company 
may have 30 different drug discov- 
ery teams with various approaches. 
The problem is to manage the 
process of experimenting with a lot 
of different approaches, to automate 
while rriaintaiiung flexibility.'* 

Gene World 2.1 enables **integra- 
don of the entire target discovery and 
validation process,** Bellenson says. 
The commercial software package 
coordinates the entire process of 
sequence-data analysis and can be 
integrated with other programs and 
da t a b ases, according to Smith, who 
adds that it handles thousands of 
sequence results, organizes and auto- 
mates annotation and seamlessly 
interacts with growing genome data- 
bases. Simple forms and menus 
enable users to turn raw sequence 
data into crucial knowledge for drug 
discovery by applying algorithms to 
sequences, creating custom analysis 
strategies and producing useful 
reports, without the need for writing 
computer code. GeneWbrld 2.1 runs 
on a variety of platforms and operat- 
ing systems. 

Pairing industrial relational data- 
base-management systems with a 
web-browser interface, Pangeas 
Operating System of Drug 
Discovery'" is an oper><ofnputing 
framework that allows dieni/server 
and Java-enabled web-based tech- 
nologies to collect, organize and ana- 
lyze drug discovery information for 
pharmaceutical companies to simpli- 
fy and accelerate drug discovery. The 
technology unites automated 
genomics database analysts for drug 
target site selection, chemical infor- 
mation database analysis and large- 
scale combinatorial chemistry pro- 
ject management and high-through- 
xrt screening project management 
for drug lead efficacy analysis. 
Pangea officials maintain that these 
integrated elements provide a unified 
environment for chemists, biologists 
and others involved in the drug dis- 
covery process to \wrk together with 




Ciwtt m 


m Tytoba*' ttt froa 


hits with 


keyword • tytattne 



atigwiwtnt | 



I Identify construed doaains 



SiotofniMticisu can design and 
s«* Strategies, such m the one 
shown ben. ttut forward data 



logiciUy and MtoouttctUy. 
(tatarchen throughout your 
organization can apply the tarn 
Strategies to thaw own data. 



commercial and public domain 
software. 

Pangeas Operating System of 
Drug Discovery can accommodate 
Sybase, Oracle or Informix relation- 
al database-managernent systems 
and any version of UNIX. It absorbs 
new data formats, databases, algo- 
rithms and analysts paradigms mto 
the automated workflow without 
software modifications. Netscape 
Navigator" provides a friendly user 
interface from PC, Macintosh, and 
UNDC workstations. 

In the near term, Pangea plans to 
complete its bioinformatics core 
with two more programs. Gene 
Foundry; a sample tracking and 
workflow sequence package for 
DNA sequence and fragment infor- 
mation, will also offer interaction 
with robots, reagent tracking and 
troubleshooting. Gene Thesaurus, 
the other package is a ^warehouse 
of bioinformatics data,** says 
Bellenson. ■ 
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town pans 30 

GTAC Chairman* Professor 
Norman C. Nevin, said 1996 saw 
"four important developments'*: an 
increase in enquiries and submis- 
sions made to GTAC; an increase in 
the complexity of submitted proto- 
cols; a continuing shift from gene 
therapy for single-gene disorders 
toward strategies aimed at tumour 
destruction in cancer; and a growth 
in international sponsorship of UK. 
gene therapy trials. 

Since 1993. GTAC and its prede- 
cessor, the Clothier Qjmmittee, have 
approved 18 UK. gene therapy clini- 
cal trials ( 1 3 of which have been car- 
ried out), which are listed in the 
report The disease areas targeted by 
these trials include severe combined 
mimunodeflciency <1 trial), cystic 
fibrosis (6), metastatic melanoma (2\ 
lymphoma (2), neuroblastoma (1), 
breast cancer (IX Hurler* syndrome 
i Ik cervical cancer ( 1 X glwolastoma 



breast cancer, breast cancer with liver 
metastases, glkrtrtastoma, malignant 
ascites due to gastrointestinal cancer 
and ovarian cancer. 

Copies of (he GTAC thrid annua] 
report are available from the GTAC 
Secretariat, Wellington House, 133- 
155 Waterloo Road London SE1 
BUG, U K. 

Coated Lenses Prevent PCO 

Scientists in the UK. say it may be 
possible to prevent posterior capsule 
opacification (PCO), a common 
complication following cataract 
surgery, by using the implanted poly- 
meutybnethacrylate (PMMA) 
intraocular lens as a drug delivery 
system. PCO occurs in 30-50% of 
cataract surgery patients as a result of 
stimulated cell growth within the 
remaining capsular bag. The condi- 
tion causes a decline in visual acuity 
and requires expensive laser treat- 
ment, thus negating the routine use of 
cataract surgery in underdeveloped 
countries, explains G. Duncan, at the 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L DeRisi, Vishwanath R, Iyer, Patrick O. Brown* 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to carry out a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The expression 
profiles observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized.genes provided clues to their possible functions. The 
same DNA microarrays were also used to identify genes whose expression was affected 
by deletion of the transcriptional co-repressor TUP1 or cwerexpression of the transcrip- 
tional activator YAP1. These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression patterns. 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the Only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (J, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 

Department of Biochemistry, Stanford University School 
of Medicine, Howard Hughes Medical Institute, Stanford 
CA 94305-5428, USA. 

*To whom correspondence should be addressed. E-mail: 
pbrown@cmgm.stanford.edu 



favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of toots is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 
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using a simple robotic 1 printing device : (9). 
Cells from an exponentially growing culture 
of yeast were inocu lated into fresh medium 
and grown at 30°C for 21 hours! After an 
initial 9 hours of growth, samples were har- 
; vested at seven successive 2-hour intervals, 
and mRNA was isolated (10). Fluorescently 
labeled cDNA was prepared by reverse tran- 
scription 4n- the presence of Cy3(green)- 
or Cy5(red)-labeled deoxyuridine triphos- 
phate (dUTP) (11) and then hybridized to 
the microarrays (12). To maximize the re- 
liability with which changes in expression ' 
levels could be discerned,: we labeled cDNA 
prepared from cells at each successive time 
point with Gy5, then mixed it witlva Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for . the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1); Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression-ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression' was remarkably stable. Indeed, 
when gene expression patterns between die 
first two cell samples (harvested at a 24iour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest, of these dif- 
ferences was only 2.7-fold (14). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4. About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have ho apparent homology 
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to any gene whose function is known (15). 
The responses .of t these previously urkhar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. . 
, The global y iew. of changes ;in expres- 
sion o{ ;genes with known functions pro- 
vides a vivid picture of the way in -which 
the . cell adapts to a changing r en viron- 
ment.Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy, metabolism. Mapping the 
changes we observed in .the mRN As en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites, through this . system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase ; .(AID2). and acetyl-coenzyme 
A(CoA) synthase (ACS J); which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TGA). cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate, where it can serve to - supply j the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCKl, encoding 
phosphoenolpyruvate carboxykinase, and 
FBPl, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
coses-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coord i- 
nately induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of rtbosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (13). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
-repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
"\ biogenesis (1/3). As more isjearned about . 
the - functions of every gene in the yeast 
'genome; the ability to gain > insight, into a 
celFs response to a changing env ironment 
through Its global gene expression patterns 
will . become increasingly powerfu 1 . ■ '. . 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could. be grouped on the basis of the 
similarities in their expression patterns. The 
characterized members 1 of each of . these 
groups also shared important similarities in ' 
their functions. Moreover, in- most cases, 
common regulatory mechanisms could be 
inferred , for sets of genes wirJvsimilar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 



the i last timepoint but less than threefold at 
the preceding timepoint. (Fig! ,5B). All of 
these , genes were known to be glucose-re- 
pressed, and five of the seven were previously 
noted to shared common upstream activat- 
ing sequence (U AS), the carbon source re- 
sponse element (GSRE) (J 6-20). A search 
in the promoter regions of the remaining two 
genes, ACRl and IDP2, revealed that 
ACR1 , a gene essential for ACSJ V activity, 
also -possessed a consensus CSRE motif, but 
interestingly, JDP2 1 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. l t \ 

Examples from additional ^groups of 
genes .that shared expression profiles are 
illustrated in Fig. -5, C ^ through F. The 
sequences upstream of the;named genes in 
Fig r 5C ,all contain stress .response; ele- 
ments (STRE), ; and with the exception 




Fig. 1. Yeast genome microarray. The actual size of the microarray is 18 mm by 18 mm. The 
microarray was printed as described (9). This image was obtained with the same fluorescent 
scanning confocal microscope used to collect all the data we report (49). A fluorescently labeled 
cDNA probe was prepared from mRNA isolated from cells harvested shortly after inoculation (culture 
density of <5 x 10 6 cells/ml and media glucose level of 19 g/liter) by reverse transcription in the 
presence of Cy3-dUTP. Similarly, a second probe was prepared from mRNA isolated from cells taken 
from the same culture 9.5 hours later (culture density of ~2 x 10 s ceils/ml, with a glucose level of 
<0.2 g/liter) by reverse transcription in the presence of Cy5-dUTP. In this image, hybridization of the 
Cy3-dUTP-labeled cDNA (that is, mRNA expression at the initial timepoint) is represented as a green 
signal, and hybridization of Cy5-dUTP-labeled cDNA (that is, mRNA expression at 9.5 hours) is 
represented as a red signal. Thus, genes induced or repressed after the diauxic shift appear in this 
image as red and green spots, respectively. Genes expressed at roughly equal levels before and after 
the diauxic shift appear in this image as yellow spots. 
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o(HSP42 t have previously been shown to 
be controlled at least in part by these 
elements (21-24). Inspection of the se- 
quences upstream of HSP42 and the two 
uncharacterized genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile [including 
HSP30, ALD2, OM45, and 10 uncharac- 
terized ORFs (25)1, nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2,3,4 has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3 f 4 (30). Indeed, a putative 
HAP2,3,4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig. 
5D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2,3,4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS ) 
that is recognized by the Rapl DNA-bind- 
ing protein (31, 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl -binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34). Indeed, we ob- 
served that the abundance of RAP I 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two, 
HAP4 and SlP4 t were induced by a factor of 
more than threefold at the diauxic shift. 
S1P4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of S1P4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic shift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



sion ratios measured in these duplicate 
experiments differed by less than a factor 
of 2. However, in a few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray indicated by the gray box 
in Fig. 1 is shown for each of 
the experiments described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepornt, 
and green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze 
the effects of the tuplA mu- 
tation and YAPl overexpres- 
ston, red spots represent 
genes whose expression was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
complete images of each of 
these arrays can be viewed on 
the Internet {13). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 
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by mutations in each putative regulatory 
gene. As a test of this strategy, we analyzed 
the genomewide changes in gene expression 
that result from deletion of the TUP J gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 



Migl and is mediated by recruiting the tran- 
scriptional co-repressors Tupl and Cyc8/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-type- 
specific, and DNA-damage-inducible genes 
(40). 



Debranching 




Fig. 3. Metabolic reprogramming inferred from global analysis of changes in gene expression. Only key 
metabolic intermediates are identified. The yeast genes encoding the enzymes that catalyze each step 
in this metabolic circuit are identified by name in the boxes. The genes encoding succinyl-CoA synthase 
and glycogen-debrancNng enzyme have not been explicitly identified, but the ORFs YGR244 and 
YPR184 show significant homology to known succinyl-CoA synthase and gtycog en-deb ranching en- 
zymes, respectively, and are therefore included in the corresponding steps in this figure. Red boxes with 
white lettering identify genes whose expression increases in the diauxic shift. Green boxes with dark 
green lettering identify genes whose expression diminishes in the diauxic shift. The magnitude of 
induction or repression is indicated for these genes. For muttimeric enzyme complexes, such as 
succinate dehydrogenase, the indicated fold-induction represents an unweighted average of all the 
genes listed in the box. Black and white boxes indicate no significant differential expression (less than 
twofold). The direction of the arrows connecting reversible enzymatic steps indicate the direction of the 
flow of metabolic intermediates, inferred from the gene expression pattern, after the diauxic shift. Arrows 
representing steps catalyzed by genes whose expression was strongly induced are highlighted in red. 
The broad gray arrows represent major increases in the flow of metabolites after the diauxic shift, 
inferred from the indicated changes in gene expression. 
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Wild-type yeast cells and cells bearing 
a deletion of the TUPI gene (tupl A) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon source. 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively (11). The labeled probes were 
mixed and simultaneously hybridized to 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tupl A 
strain, and thus presumably repressed by 
Tupl (41 ). A representative section of the 
microarray (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tupl A mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (13)]. Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUP J , suggesting that these genes may be 
subject to TUPi -mediated repression by 
glucose. For example, SUC2, the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUPI . 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating- type-specific genes MFAI and 
MFA2, and the DNA damage-inducible 
RNR2 and KNR4, as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUPI itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tupl A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUPI -repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUPI 
was deleted. Another group of related 
genes that appeared to be subject to TUPI 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 of the 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the tup /A 
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strain, and 18 of these genes were induced 
by more than sevenfold when TUP1 was 
deleted. In contrast, none of 83 genes that 
could be classified as putative regulators of 
the cell division cycle were induced more 
than twofold by deletion of TVPl . Thus, 
despite the diversity of the regulatory sys- 
tems that employ Tupl, most of the genes 
that it regulates under these conditions 
fall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFA] 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tup J A 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFA I and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAPI en- 
codes a DNA-binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Overexpression of YAPI in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild-type strain bearing a control plasmid 
and a strain with a plasmid expressing YAPI 
under the control of the strong GAL1-10 
promoter, both grown in galactose (that is, 
a condition that induces YAPI overexpres- 
sion). Complementary DNA from the con- 
trol and YAPI overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the array represent genes 
that were induced in the strain overexpress- 
ing YAPI. 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



YAPI was overexpressed in this way, five 
bear homology to aryl-alcohol oxidoreduc- 
tases (Fig. 2 and Table 1). An additional 
four of the genes in this set also belong to 
the general class of dehydrogenases/oxi- 
doreductases. Very little is known about 
the role of aryl-alcohol oxidoreductases in 
S. cerevisiae, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47). 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 

Fig. 4. Coordinated reg- 
ulation of functionally re- 
lated genes. The curves 
represent the average in- 
duction or repression ra- 
tios for all the genes in 
each indicated group. 
The total number of 
genes in each group was 
as follows: ribosomal 
proteins, 1 12; translation 
elongation and initiation 

factors, 25; tRNA synthetases (excluding mitochondial synthetases). 17; glycogen and trehalose syn- 
thesis and degradation, 15; cytochrome c oxidase and reductase proteins, 19; and TCA- and glyoxy- 
late-cycle enzymes, 24. 

Table 1 . Genes induced by YAP1 overexpression. This list includes ail the genes for which mRNA levels 
increased by more than twofold upon YAP1 overexpression in both of two duplicate experiments, and 
for which the average increase in mRNA level in the two experiments was greater than threefold (50). 
Positions of the canonical Yap1 binding sites upstream of the start codon. when present, and the 
average fold-increase in mRNA levels measured in the two experiments are indicated. 



might play an important protective role 
during oxidative stress. Transcription of a 
small number of genes was reduced in the 
strain overexpressing Yapl. Interestingly, 
many of these genes encode sugar per- 
meases or enzymes involved in inositol 
metabolism. 

We searched for Yapl-binding sites 
(TTACTAA or TGACTAA) in the se- 
quences upstream of the target genes we 
identified (48). About two- thirds of the 
genes that were induced by more than 
threefold upon Yapl overexpression had 
one or more binding sites within 600 bases 
upstream of the start codon (Table 1), sug- 
gesting that they are directly regulated by 
Yapl. The absence of canonical Yapl-bind- 
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ing sites upstream of the others may reflect 
an ability of Yapl to bind sites that differ 
from the canonical binding sites, perhaps in 
cooperation with other factors, or less like- 
ly, may represent an indirect effect of Yapl 
overexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 



works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate drug targets can serve as surrogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 



required for fabricating and using DNA 
microarrays (9) consists of components 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
to accomplish the amplification of more 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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concentration in the media. (B) Seven genes exhibited a strong induction (greater than ninefold) onry at 
the last timepoint (20.5 hours). With the exception of IDP2. each of these genes has a CSRE UAS There 
were no additional genes observed to match this profile. (C) Seven members of a class of genes marked 
by early induction with a peak in mRNA levels at 18.5 hours. Each of these genes contain STRE motif 
repeats in their upstream promoter regions. (D) Cytochrome c oxidase and ubiquinol cytochrome c 
reductase genes. Marked by an induction coincident with the diauxic shift, each of these genes contains 
a consensus binding motif for the HAP2.3.4 protein complex. At least 17 genes shared a similar 
expression profile. (E) SAM1, GPP1, and several genes of unknown function are repressed before the 
diauxic shift, and continue to be repressed upon entry into stationary phase. (F) Ribosomai protein 
genes compnse a large class of genes that are repressed upon depletion of glucose. Each of the genes 
profiled here contains one or more RAP1 -binding motifs upstream of its promoter. RAP1 is a transcrip- 
tional regulator of most ribosomai proteins. 



www.sciencemag.org • SCIENCE • VOL. 278 • 24 OCTOBER 1997 



685 



Uon, the bound ONA was denatured by a 2-min in- 
cubation in distilled water at ~95°C. The slides were 
then transferred into a bath of 1 00% ethanol at room 
temperature, rinsed, and then spun dry in a cfinicai 
centrifuge. Sides were stored in a closed box at 
room temperature until used. 

10. YPD medium (8 liters), in a 10-liter fermentation 
vessel, was inoculated with 2 ml of a fresh over- 
night culture of yeast strain DBY7286 (MATa, ura3, 
GAL2). The fermentor was maintained at 30°C with 
constant agitation and aeration. The glucose con- 
tent of the media was measured with a UV test kit 
(Boehringer Mannheim, catalog number 716251) 
Ceil density was measured by OD at 600- nm wave- 
length. Aliquots of culture were rapidly withdrawn 
from the fermentation vessel by peristaltic pump, 
spun down at room temperature, and then flash 
frozen with liquid nitrogen. Frozen cells were stored 
at -80°C. 

1 1 . Cy3-dUTP or Cy5-dUTP (Amersham) was incorpo- 
rated during reverse transcription of 1.25 p.g of 
polyadenylated |poty(A)+] RNA. primed by a oT(16) 
oligomer. This mixture was heated to 70°C for 10 
min, and then transferred to ice. A premixed solu- 
tion, consisting of 200 U Superscript II (Gibco), 
buffer, deoxyribonucleoside triphosphates, and flu- 
orescent nucleotides, was added to the RNA. Nu- 
cleotides were used at these final concentrations: 
500 m-M for dATP, dCTP. and dGTP and 200 jlM 
for dTTP. Cy3-dUTP and Cy5-dUTP were used at 
a final concentration of 1 00 pM. The reaction was 
then Incubated at 42*C for 2 hours. Unincorporat- 
ed fluorescent nucleotides were removed by first 
diluting the reaction mixture with of 470 pJ of 10 
mM tris-HCl (pH 8.0)/1 mM EDTA and then subse- 
quently concentrating the mix to -5 |U. using Cen- 
tricon-30 microconcentrators (Amicon). 

1 2. Purified, labeled cDNA was resuspended in 1 1 pJ of 
3.5x SSC containing 10 jig poly(dA) and 0.3 p.) of 
10% SDS. Before hybridization, the solution was 
boiled for 2 min and then allowed to cool to room 
temperature. The solution was applied to the mi- 
croarray under a cover slip, and the slide was 
placed in a custom hybridization chamber which 
was subsequently incubated for -8 to 12 hours in 
a water bath at 62°C. Before scanning, slides were 
washed in 2x SSC, 0.2% SDS for 5 min, and then 
0.05X SSC for 1 min. Slides were dried before 
scanning by centrifugation at 500 rpm in a Beck- 
man CS-6R centrifuge. 

1 3. The complete data set is available on the Internet at 
cmgm.starrford.edu/pbrowr^^ 

14. For 95% of all the genes analyzed, the mRNA levels 
measured in ceils harvested at the first and second 
interval after inoculation Offered by a factor of less 
than 1.5. The correlation coefficient for the compar- 
ison between mRNA levels measured for each gene 
in these two different mRNA samples was 0.98. 
When dupficate mRNA preparations from the same 
eel) sample were compared in the same way, the 
correlation coefficient between the expression levels 
measured for the two samples by comparative hy- 
bridization was 0.99. 

15. The numbers and identities of known and putative 
genes, and their homologies to other genes, were 
gathered from the following public databases: Sac- 
charomyces Genome Database {genome-www. 
stanford.edu), Yeast Protein Database (quest 7. 
proteome.com), and Munich Information Centre for 
Protein Sequences (speedy.rnips.bk>chem.rrpg.de/ 
mips/yeast/index.htmtx). 

16. A. Scholer and H. J. SchuBer. Md. CeB. Bid. 14, 
3613 (1994). 

1 7. S. Kratzer and H. J. SchuBer. Gene 161 , 75 (1995). 

18. R. J. Hasetoeck and H. L McAlister, J. So/. Chem. 
268,12116(1993). 

19. M. Fernandez, E. Fernandez, R. Rodfcio, Md. Gen. 
Genet. 242, 727 (1994). 

20. A. HaiUgef a/., Nudetc Adds Res. 20, 5677 (1992). 

21. P. M. Martinez ef a/. . EMBO J. 1 5, 2227 (1 996). 

22. J. C. Varela, U. M. Praekett, P. A. Meacock, R. J. 
Ranta, W. H. Mager, Md. CeB. Biol. 1 5, 6232 (1995). 

23. H. Ruis and C. SchuBer, Sroessays 17, 959 (1995). 

24. J. L. Parrou, M. A. Teste. J. Francois, Microbiology 
143, 1891 (1997). 



25. This expression profle was defined as having an 
induction of greater than 10-fold at 16.5 hours and 
less than 1 1 -fold at 20.5 hours. 

26. S. L Forsburg and L. Guarente, Genes Dev. 3, 1 166 
(1989). 

27. J. T. desen and L Guarente, ibid. 4. 1 714 (1990). 

28. M. Rosenkrantz, C. S. Kell. E. A. Pennefl, L J. De- 
venish. Md. Microbid. 13. 119(1994). 

29. Single-letter abbreviations for the amino acid resi- 
dues are as folows: A, Ala; C, Cys; D, Asp; E, Glu; F, 
Phe; G. Oy; H, His; I, De; K, Lys; L. Leu; M, Met; N, 
Asn; P, Pro; Q. Gin; R, Arg; S. Ser; T, Thr; V, Val; W, 
Trp; and Y, Tyr. The nucleotide codes are as follows: 
B-C. G, or T; N-G. A, T, or C; R-A or G; and Y-C or 
T. 

30. C. Fondrat and A. Katogeropoulos, Comput. Appi. 
Biosd. 12.363(1996). 

31. D. Shore, Trends Genet. 10, 408 (1994). 

32. R. J. Planta and H. A. Raue, fad. 4, 64 (1988). 

33. The degenerate consensus sequence VYCYRNNC- 
MNH was used to search for potential RAP1 -binding 
sites. The exact consensus, as defined by (30), is 
WACAYCCRTACATYW, with up to three Differenc- 
es allowed. 

34. S. F. Neuman, S, Bhattacharya. J. R. Broach, Mo/. 
CeB. Biol. 15, 3187 (1995). 

35. P. Lesage, X. Yang, M. Carlson, Md. 16. 1921 
(1996). 

36. For example, we observed large inductions of the 
genes coding for PCX?. FBP1 \Z. Yin et a}., Mo/. 
Microbid, 20, 751 (1996)]. the central glyoxylate 
cycle gene ICL1 [A. Scholer and H. J. Schuller, 
Curr. Genet. 23, 375 (1993)]. and the "aerobic" 
isoform of acetyl-CoA synthase, ACS1 |M. A. van 
den Berg ef al. , J. Biol. Chem. 271 . 28953 (1 996)], 
with concomitant down- regulation of the gtycolyt- 
ic-specific genes PYK1 and PFK2 [P. A. Moore et 
al., Mol. Cell. Biol. 11. 5330 (1991)]. Other genes 
not directly involved in carbon metabolism but 
known to be induced upon nutrient limitation in- 
clude genes encoding cytosollc catalase T C7T7 
IP. H. Bissinger ef al., ibid. 9, 1309 (1989)] and 
several genes encoding small heat-shock proteins, 
such as HSP12, HSP26. and HSP42 (I. Farkas ef 
a/., J. Bid. Chem. 266, 15602 (1991); U. M. 
Praekelt and P. A. Meacock, Md. Gen. Genet. 223, 
97 (1990); D. Wotton ef al., J. BtoL Chem. 271, 
2717(1996)]. 

37. The levels of induction we measured for genes that 
were expressed at very low levels in uninduced 
state (notably, FBP1 and PCK1) were generally tower 
than those previously reported. This discrepancy 
was likely due to the conservative background sub- 
traction method we used, which generally resulted in 
overestimation of very tow expression levels (46). 

38. Cross-hybridization of highly related sequences can 
also occasionally obscure changes in gene expres- 
sion, an important concern where members of gene 
famiBes are functional specialized and differentially 
regulated. The major alcohol dehydrogenase genes. 
ADH1 and ADH2, share 88% nucleotide identity. 
Reciprocal regulation of these genes is an important 
feature of the diauxic shift, but was not observed in 
this experiment, presumably because of cross-hy- 
bridization of the fluorescent cDNAs representing 
these two genes. Nevertheless, we were able to de- 
tect differential expression of closely related isoforms 
of other enzymes, such as HXK1/HXK2 (77% iden- 
tical) {P. Herrero ef al. . Veasf 11,137 (1 995)], MLS1/ 
DAL7 (73% identical) (20), and PGM1/PGM2 (72% 
identical) [D. Oh, J. E. Hopper. Md. CeB. Bid. 10, 
1 41 5 (1 990)], in accord with previous studies. Use in 
the mtoroarray of deliberately selected ONA se- 
quences corresponding to the most divergent seg- 
ments of homologous genes, in lieu of the complete 
gene sequences, should relieve this problem in many 
cases. 

39. F. E. Williams, U. Varanasi. R. J. Trumbly. Md. CeB. 
Bid. 11,3307(1991). 

40. D. Tzamarias and K. Struhl, Nature 369, 758 (1994). 

41 . Differences in mRNA levels between the tuplA and 
wild-type strain were measured in two independent 
experiments. The correlation coefficient between the 
complete sets of expression ratios measured in 
these duplicate experiments was 0.83. The concor- 



dance between the sets of genes that appeared to 
be induced was very high between the two experi- 
ments. When only the 355 genes that showed at 
least a twofold ^crease in mRNA in the tupl A strain 
in either of the duplicate experiments were com- 
pared, the correlation coefficient was 0.82. 

42. The tuplA mutation consists of an insertion of the 
LEU2 coding sequence, indutfing a stop codon, be- 
tween the ATG of TUP1 and an Eco R I site 1 24 base 
pairs before the stop codon of the TUP1 gene. 

43. L R. Kowalski. K. Kondo, M. Inouye, Md. Microbid. 
15,341 (1995). 

44. M. Viswanathan. G. Muthukumar, Y. S. Cong, J. 
Lenard. Gene 148, 149(1994). 

45. D. Hirata, K. Yano, T. Mfyakawa, Md. Gen. Genet. 
242. 250 (1994). 

46. A. Gutierrez, L Caramelo, A. Prieto, M. J. Martinez. 
A. T. Martinez. Appi. Environ. Microbid. 60, 1783 
(1994). 

47. A. Muheim ef al., Eur. J. Btochem. 195, 369 (1991). 

48. J. A. Wemmie. M. a Szczypka, D. J. THeie, W. S. 
Move-Rowley. J. Biol. Chem. 269, 32592 (1994). 

49. Microarrays were scanned using a custom-built 
scanning laser microscope built by S. Smith with 
software written by N. Zrv. Detafls concerning scan- 
ner design and construction are available at cmgm. 
stanford.edu/pbrown. Images were scanned at a 
resolution of 20 >&m per pixel. A separate scan, using 
the appropriate excitation ine, was done for each of 
the two ftuorophores used. During the scanning pro- 
cess, the ratio between the signals in the two chan- 
nels was calculated for several array elements con- 
taining total genomic DNA. To normafize the two 
channels with respect to overall intensity, we then 
adjusted photomuttipEer and laser power settings 
such that the signal ratio at these elements was as 
close to 1 .0 as possible. The combined images were 
analyzed with custom-written software. A bounding 
box, fitted to the size of the DNA spots in each 
quadrant, was placed over each array element. The 
average fluorescent intensity was calculated by sum- 
ming the intensities of each pixel present in a bound- 
ing box, and then cfividing by the total number of 
pixels. Local area background was calculated for 
each array element by determining the average fluo- 
rescent intensity for the tower 20% of pixel intensi- 
ties. Although this method tends to underestimate 
the background, causing an underestimation of ex- 
treme ratios, it produces a very consistent and noise- 
tolerant approximation. Although the anatog-to- 
digjtal board used for data collection possesses a 
wide dynamic range (12 bits), several signals were 
saturated (greater than the maximum signal intensity 
allowed) at the chosen settings. Therefore, extreme 
ratios at bright elements are generally underestimat- 
ed. A signal was deemed significant if the average 
intensity after background subtraction was at least 
2.5-fold higher than the standard deviation in the 
background measurements for all elements on the 
array. 

50. In addition to the 17 genes shown in Table 1 , three 
additional genes were induced by an average of 
more than threefold in the duplicate experiments, but 
in one of the two experiments, the induction was less 
than twofold (range 1 .6- to 1 .9-fold) 

51. We thank H. Bennett, P. Speflman, J. Ravetto, M. 
Eisen, R. PilJai, B. Dunn. T. Ferea. and other mem- 
bers of the Brown lab for their assistance and helpful 
advice. We also thank S. Friend, D. Botstein, S. 
Smith, J. Hudson, and D. Dolglnow for advice, sup- 
port, and encouragement; K Struhl and S. Chatter- 
jee for the Tupl deletion strain; L Femandes for 
helpful advice on Yap1; and S. Wapholz and the 
reviewers for many helpful comments on the manu- 
script. Supported by a grant from the National Hu- 
man Genome Research Institute (NHGRI) 
(HG00450). and by the Howard Hughes Medical In- 
stitute (HHMI). J.D.R. was supported by the HHMI 
and the NHGRI. V.R. was supported in part by an 
Institutional Training Grant in Genome Science (T32 
HG00044) from the NHGRI. P.O.B. is an associate 
investigator of the HHMI. 

5 September 1997; accepted 22 September 1997 



686 



SCIENCE • VOL. 278 • 24 OCTOBER 1997 • www.sciencemag.org 



I 



Attachment 4 of 11 / H 
InUSSN: 09/918,624 
PA-0033 US 



FOCUS - 17 of 19 DOCUMENTS 

Copyright 1997 PR Newswire Association, Inc. 
PR Newswire 

August 11, 1997, Monday 

SECTION: Financial News 

DISTRIBUTION: TO BUSINESS AND MEDICAL EDITORS 
LENGTH: 478 words 

HEADLINE: Eli Lilly & Co. and Acacia Biosciences Enter Into Research Collaboration; 
First Corporate Agreement for Acacia's Genome Reporter Matrix(TM) 
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BODY: 

Acacia Biosciences and Eli Lilly and Company (Lilly) announced today the signing of a joint research collaboration 
to utilize Acacia's Genome Reporter Matrix(TM) (GRM) to aid in the selection and optimization of lead compounds. 
Under the collaboration, Acacia will provide chemical and biological profiles on a class of Lilly's compounds for an 
undisclosed fee. 

Acacia's GRM is an assay-based computer modeling system that uses yeast as a miniature ecosystem. The GRM 
can profile the extent, nature and quantity of any changes in gene expression. Because of the similarities between 
the yeast and human genome, the system serves as an excellent surrogate for the human body, mimicking the effects 
induced by a biologically active molecule. 

"Using yeast as a model organism for lead optimization makes a lot of sense given the high degree of homology with 
human metabolic pathways," said William Current of Lilly Research Laboratories. "Acacia's innovative GRM has 
the potential to provide enormous insight into the therapeutic impact of our compounds and make the drug discovery 
process more rational. It should substantially accelerate the development process. " 

"This first agreement with a major pharmaceutical company is an important milestone in the development of 
Acacia/ said Bruce Cohen, President and CEO of Acacia. "The deal is in line with our strategy of establishing 
alliances that will allow our collaborators to use genomic profiles to identify and optimize compounds within 
their existing portfolios. In the long run, this technology can be used to characterize large scale combinatorial 
libraries, predict side effects prior to clinical trials and resurrect drugs that have failed during clinical trials." 

The GRM incorporates two critical elements: chemical response profiles and genetic response profiles. The 
chemical response profiles measure the change in gene expression caused by potential therapeutics and then rank genes 
with altered expressions by degree of response. The genetic response profiles measure changes in gene expression 
caused by mutations in the genes encoding potential targets of pharmaceuticals; these genetic response profiles represent 
gold standards in drug discovery by defining the response profile expected for drugs with perfect selectivity and 
specificity. By comparing the two profiles, one can analyze a potential drug candidate's ability to mimic the action of 
a 'perfect 1 drug. 

Acacia Biosciences is a functional genomics company developing proprietary technologies to enhance the speed 
and efficacy of drug discovery and development. Acacia's Genome Reporter Matrix capitalizes on the latest advances 
in genomics and combinatorial chemistry to generate comprehensive profiles of drug candidates' in vivo activity. 
SOURCE Acacia Biosciences 

CONTACT: Bruce Cohen, President and CEO of Acacia Biosciences, 510-669-2330 ext. 103 or Media: Linda 
Seaton of Feinstein 
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1 . An important feature of the work of many molecular biologists is identifying which 
genes are switched on and off in a cell under different environmental conditions or 
subsequent to xenobiotic challenge. Such information has many uses, including the 
deciphering of molecular pathways and facilitating the development of new experimental 
and diagnostic procedures. However, the student of gene hunting should be forgiven for 
perhaps becoming confused by the mountain of information available as there appears to be 
almost as many methods of discovering differentially expressed genes as there are research 
groups using the technique. 

2 . The aim of this review was to clarify the main methods of differential gene expression 
analysis and the mechanistic principles underlying them. Also included is a discussion on 
some of the practical aspects of using this technique. Emphasis is placed on the so-called 
'open ' systems, which require no prior knowledge of the genes contained within the study 
model. Whilst these will eventually be replaced by 'closed ' systems in the study of human, 
mouse and other commonly studied laboratory animals, they will remain a powerful tool for 
those examining less fashionable models. 

3. The use of suppress ion -PCR subtractive hybridization is exemplified in the 
identification of up- and down- regulated genes in rat liver following exposure to pheno- 
barbital, a well-known inducer of the drug metabolizing enzymes. 

4. Differential gene display provides a coherent platform for building libraries and 
microchip arrays of 'gene fingerprints* characteristic of known enzyme inducers and 
xenobiotic toxicants, which may be interrogated subsequently for the identification and 
characterization of xenobiotics of unknown biological properties. 



Introduction 

It is now apparent that the development of almost all cancers and many non- 
neoplastic diseases are accompanied by altered gene expression in the affected cells 
compared to their normal state (Hunter 1991, Wynford -Thomas 1991, Vogelstein 
and Kinzler 1993, Semenza 1 994, Cassidy 1995, Kleinjan and Van Hegningen 1 998). 
Such changes also occur in response to external stimuli such as pathogenic micro- 
organisms (Rohn et al. 1996, Singh et al. 1997, Griffin and Krishna 1998, Lunney 
1998) and xenobiotics (Sewall et al. 1995, Dogra et al. 1998, Ramana and Kohli 
1998), as well as during the development of undifferentiated cells (Hecht 1998, 
Rudin and Thompson 1998, Schneider-Maunoury et al. 1998). The potential 
medical and therapeutic benefits of understanding the molecular changes which 
occur in any given cell in progressing from the normal to the 'altered* state are 
enormous. Such profiling essentially provides a 'fingerprint' of each step of a 
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cell's development or response and should help in the elucidation of specific and 
sensitive biomarkers representing, for example, different types of cancer or previous 
exposure to certain classes of chemicals that are enzyme inducers. 

In drug metabolism, many of the xenobiotic-metabolizing enzymes (including 
the well-characterized isoforms of cytochrome P450) are inducible by drugs and 
chemicals in man (Pelkonen et al. 1998), predominantly involving transcriptional 
activation of not only the cognate cytochrome P450 genes, but additional cellular 
proteins which may be crucial to the phenomenon of induct ion. Accordingly, the 
development of methodology to identify and assess the full complement of genes 
that are either up- or down-regulated by inducers are crucial in the development of 
knowledge to understand the precise molecular mechanisms of enzyme induction 
and how this relates to drug action. Similarly, in the field of chemical-induced 
toxicity, it is now becoming increasingly obvious that most adverse reactions to 
drugs and chemicals are the result of multiple gene regulation, some of which are 
causal and some of which are casually-related to the toxicological phenomenon per 
se. This observation has led to an upsurge in interest in gene-profiling technologies 
which differentiate between the control and toxin -treated gene pools in target tissues 
and is, therefore, of value in rationalizing the molecular mechanisms of xenobiotic- 
induced toxicity. Knowledge of toxin -dependent gene regulation in target tissues is 
not solely an academic pursuit as much interest has been generated in the 
pharmaceutical industry to harness this technology in the early identification of toxic 
drug candidates, thereby shortening the developmental process and contributing 
substantially to the safety assessment of new drugs. For example, if the gene profile 
in response to say a testicular toxin that has been well-characterized in vivo could be 
determined in the testis, then this profile would be representative of all new drug 
candidates which act via this specific molecular mechanism of toxicity, thereby 
providing a useful and coherent approach to the early detection of such toxicants. 
Whereas it would be informative to know the identity and functionality of all genes 
up/down regulated by such toxicants, this would appear a longer term goal, as the 
majority of human genes have not yet been sequenced, far less their functionality 
determined. However, the current use of gene profiling yields a pattern of gene 
changes for a xenobiotic of unknown toxicity which may be matched to that of well- 
characterized toxins, thus alerting the toxicologist to possible in vivo similarities 
between the unknown and the standard, thereby providing a platform for more 
extensive toxicological examination. Such approaches are beginning to gain 
momentum, in that several biotechnology companies are commercially producing 
'gene chips' or 'gene arrays' that may be interrogated for toxicity assessment of 
xenobiotics. These chips consist of hundreds/thousands of genes, some of which are 
degenerate in the sense that not all of the genes are mechanistically-related to any 
one toxicological phenomenoa Whereas these chips are useful in broad-spectrum 
screening, they are maturing at a substantial rate, in that gene arrays are now 
becoming more specific, e.g. chips for the identification of changes in growth factor 
families that contribute to the aetiology and development of chemically-induced 
neoplasias. 

Although documenting and explaining these genetic changes presents a 
formidable obstacle to understanding the different mechanisms of development and 
disease progression, the technology is now available to begin attempting this difficult 
challenge. Indeed, several 'differential expression analysis' methods have been 
developed which facilitate the identification of gene products that demonstrate 
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altered expression in cells of one population compared to another. These methods 
have been used to identify differential gene expression in many situations, including 
invading pathogenic microbes (Zhao et al. 1998), in cells responding to extracellular 
and intracellular microbial invasion (Duguid and Dinauer 1990, Ragno et al. 1997, 
Maldarelli et al. 1998), in chemically treated cells (Syed et al. 1997, Rockett et al. 
1999), neoplastic cells (Liang et al 1992, Chang and Terzaghi-Howe 1998), 
activated cells (Gurskaya et al. 1996, Wan et al. 1996), differentiated cells (Hara et 
al. 1991, Guimaraes et al. 1995a, b), and different cell types (Davis et al. 1984, 
Hedrick et al. 1984, Xhu et al. 1998). Although differential expression analysis 
technologies are applicable to a broad range of models, perhaps their most important 
advantage is that, in most cases, absolutely no prior knowledge of the specific genes 
which are up- or down-regulated is required. 

The field of differential expression analysis is a large and complex one, with 
many techniques available to the potential user. These can be categorized into 
several methodological approaches, including: 

(1) Differential screening, 

(2) Subtractive hybridization (SH) (includes methods such as chemical cross- 
linking subtraction — CCLS, suppression-PCR subtractive hybridization — 
SSH, and representational difference analysis — RDA), 

(3) Differential display (DD), 

(4) Restriction endonuclease facilitated analysis (including serial analysis of gene 
expression — SAGE — and gene expression fingerprinting — GEF), 

(5) Gene expression arrays, and 

(6) Expressed sequence tag (EST) analysis. 

The above approaches have been used successfully to isolate differentially 
expressed genes in different model systems. However, each method has its own 
subtle (and sometimes not so subtle) characteristics which incur various advantages 
and disadvantages. Accordingly, it is the purpose of this review to clarify the 
mechanistic principles underlying the main differential expression methods and to 
highlight some of the broader considerations and implications of this very powerful 
and increasingly popular technique. Specifically, we will concentrate on the so- 
called 'open' systems, namely those which do not require any knowledge of gene 
sequences and, therefore, are useful for isolating unknown genes. Two 'closed' 
systems (those utilising previously identified gene sequences), EST analysis and the 
use of DNA arrays, will also be considered briefly for completeness. Whilst 
emphasis will often be placed on suppression PCR subtractive hybridization (SSH, 
the approach employed in this laboratory), it is the aim of the authors to highlight, 
wherever possible, those areas of common interest to those who use, or intend to use, 
differential gene expression analysis. 

Differential cDNA library screening (DS) 

Despite the development of multiple technological advances which have recently 
brought the field of gene expression profiling to the forefront of molecular analysis, 
recognition of the importance of differential gene expression and characterization of 
differentially expressed genes has existed for many years. One of the original 
approaches used to identify such genes was described 20 years ago by St John and 
Davis (1979). These authors developed a method, termed 'differential plaque filter 
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hybridization', which was used to isolate galactose-inducible DNA sequences from 
yeast. The theory is simple: a genomic DNA library is prepared from normal, 
unstimulated cells of the test organism/tissue and multiple filter replicas are 
prepared. These replica blots are probed with radioactively (or otherwise) labelled 
complex cDNA probes prepared from the control and test cell mRNA populations. 
Those mRNAs which are differentially expressed in the treated cell population will 
show a positive signal only on the filter probed with cDNA from the treated cells. 
Furthermore, labelled cDNA from different test conditions can be used to probe 
multiple blots, thereby enabling the identification of mRNAs which are only up- 
regulated under certain conditions. For example, St John and Davis (1979) screened 
replica filters with acetate-, glucose- and galactose-derived probes in order to obtain 
genes induced specifically by galactose metabolism. Although groundbreaking in its 
time this method is now considered insensitive and time-consuming, as up to 2 
months are required to complete the identification of genes which are differentially 
expressed in the test population. In addition, there is no convenient way to check 
that the procedure has worked until the whole process has been completed. 

Subtractive Hybridization (SH) 

The developing concept of differential gene expression and the success of early 
approaches such as that described by St John and Davis (1979) soon gave rise to a 
search for more convenient methods of analysis. One of the first to be developed was 
SH, numerous variations of which have since been reported (see below). In general, 
this approach involves hybridization of mRNA /cDNA from one population (tester) 
to excess mRNA/cDNA from another (driver), followed by separation of the 
unhybridized tester fraction (differentially expressed) from the hybridized common 
sequences. This step has been achieved physically, chemically and through the use 
of selective polymerase chain reaction (PCR) techniques. 

Physical separation 

Original subtractive hybridization technology involved the physical separation 
of hybridized common species from unique single stranded species. Several methods 
of achieving this have been described, including hydroxyapatite chromatography 
(Sargent and Dawid 1983), avidin-biotin technology (Duguid and Dinauer 1990) 
and oligodT -latex separation (Hara et al. 1991). In the first approach, common 
mRNA species are removed by cDNA (from test cells)-mRNA (from control cells) 
subtractive hybridization followed by hydroxyapatite chromatography, as hydroxy- 
apatite specifically adsorbs the cDNA-mRNA hybrids. The unabsorbed cDNA is 
then used either for the construction of a cDNA library of differentially expressed 
genes (Sargent and Dawid 1983, Schneider et al. 1988) or directly as a probe to 
screen a preselected library (Zimmerman et al. 1980, Davis et al. 1984, Hedrick et al. 
1984). A schematic diagram of the procedure is shown in figure 1. 

Less rigorous physical separation procedures coupled with sensitivity enhancing 
PCR steps were later developed as a means to overcome some of the problems 
encountered with the hydroxyapatite procedure. For example, Daguid and Dinauer 
(1990) described a method of subtraction utilizing biotin-affinity systems as a means 
to remove hybridized common sequences. In this process, both the control and 
tester mRNA populations are first converted to cDNA and an adaptor (' oligo vector 
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or 

Produce clones Label directly and probe library 

Figure 1. The hydroxyapatite method of subtractive hybridization. cDNA derived from the 
treated /altered (tester) population is mixed with a large excess of mRNA from the control (driver) 
population. Following hybridization, mRNA-cDNA hybrids are removed by hydroxyapatite 
chromatography. The only cDNAs which remain are those which are differentially expressed in 
the treated /altered population. In order to facilitate the recovery of full length clones, small cDN A 
fragments are removed by exclusion chromatography. The remaining cDNAs are then cloned into 
a vector for sequencing, or labelled and used directly to probe a library, as described by Sargent 
and Dawid (1983). 

containing a restriction site) ligated to both sides. Both populations are then 
amplified by PCR, but the driver cDNA population is subsequently digested with 
the adaptor-containing restriction endonuclease. This serves to cleave the oligo- 
vector and reduce the amplification potential of the control population. The digested 
control population is then biotinylated and an excess mixed with tester cDNA. 
Following denaturation and hybridization, the mix is applied to a biocytin column 
(streptavidin may also be used) to remove the control population, including 
heteroduplexes formed by annealing of common sequences from the tester 
population. The procedure is repeated several times following the addition of fresh 
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Figure 2. The use of oligodT^ latex to perform subtractive hybridization. mRNA extracted from the 
control (driver) population is converted to anchored cDNA using polydT oligonucleotides 
attached to latex beads. mRNA from the treated/altered (tester) population is repeatedly 
hybridized against an excess of the anchored driver cDNA. The final population of mRNA is 
tester specific and can be converted into cDNA for cloning and other downstream applications, as 
described by Hara et al. (1991), 
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control cDNA. In order to further enrich those species differentially expressed in 
the tester cDNA, the subtracted tester population is amplified by PCR following 
every second subtraction cycle. After six cycles of subtraction (three reamplification 
steps) the reaction mix is ligated into a vector for further analysis. 

In a slightly different approach, Hara et al. (1991) utilized a method whereby 
oligo(dT 30 ) primers attached to a latex substrate are used to first capture mRNA 
extracted from the control population. Following 1st strand cDNA synthesis, the 
RNA strand of the heteroduplexes is removed by heat denaturation and centri- 
fugation (the cDNA-oligotex-dT^ forms a pellet and the supernatant is removed). 
A quantity of tester mRNA is then repeatedly hybridized to the immobilized control 
(driver) cDNA (which is present in 20-fold excess). After several rounds of 
hybridization the only mRNA molecules left in the tester mRNA population are 
those which are not found in the driver cDNA-oligotex-dTgo population. These 
tester-specific mRNA species are then converted to cDNA and, following the 
addition of adaptor sequences, amplified by PCR. The PCR products are then 
ligated into a vector for further analysis using restriction sites incorporated into the 
PCR primers. A schematic illustration of this subtraction process is shown in figure 
2. 

However, all these methods utilising physical separation have been described as 
inefficient due to the requirement for large starting amounts of mRNA, significant 
loss of material during the separation process and a need for several rounds of 
hybridization. Hence, new methods of differential expression analysis have recently 
been designed to eliminate these problems. 

Chemical Cross-Linking Subtraction ( CCLS ) 

In this technique, originally described by Hampson et al. (1992), driver mRNA 
is mixed with tester cDNA (1st strand only) in a ratio of > 20:1. The common 
sequences form cDNA:mRNA hybrids, leaving the tester specific species as single 
stranded cDNA, Instead of physically separating these hybrids, they are inactivated 
chemically using 2,5 diaziridinyl-1 ,4-benzoquinone (DZQ). Labelled probes are 
then synthesized from the remaining single stranded cDNA species (unreacted 
mRNA species remaining from the driver are not converted into probe material due 
to specificity of Sequenase T7 DNA polymerase used to make the probe) and used 
to screen a cDN A library made from the tester cell population. A schematic diagram 
of the system is shown in figure 3. 

It has been shown that the differentially expressed sequences can be enriched at 
least 300-fold with one round of subtraction (Hampson et al. 1992), and that the 
technique should allow isolation of cDNAs derived from transcripts that are present 
at less than 50 copies per cell. This equates to genes at the low end of intermediate 
abundance (see table 1). The main advantages of the CCLS approach are that it is 
rapid, technically simple and also produces fewer false positives than other 
differential expression analysis methods. However, like the physical separation 
protocols, a major drawback with CCLS is the large amount of starting material 
required (at least 10 pi% RNA). Consequently, the technique has recently been 
refined so that a renewable source of RNA can be generated. The degenerate random 
oligonucleotide primed (DROP) adaptation (Hampson et al. 1996, Hampson and 
Hampson 1997) uses random hexanucleotide sequences to prime solid phase- 
synthesized cDNA. Since each primer includes a T7 polymerase promotor sequence 
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T 
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Figure 3. Chemical cross-linking subtraction. Excess driver mRNA is mixed with l sl strand tester 
cDNA. The common sequences form mRNA:cDNA hybrids which are cross linked with 2,5 
diaziridinyl-l,4-benzoquinone (DZQ) and the remaining cDNA sequences are differentially 
expressed in the tester population. Probes are made from these sequences using Sequenase 2.0 
DNA polymerase, which lacks reverse transcriptase activity and, therefore, does not react with the 
remaining mRNA molecules from the driver. The labelled probes are then used to screen a cDNA 
library for clones of differentially expressed sequences. Adapted from Walter et al. (1996), with 
permission. 



Table 1. The abundance of mRNA species and classes in a typical mammalian cell. 











Mean mass 




Copies of 


No. of mRNA 


Mean % of 


(ng) of each 


mRNA 


each 


species in 
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species //ig 


class 


species/cell 
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in class 


total RNA 


Abundant 


12000 


4 


3.3 


1.65 


Intermediate 


300 


500 


0.08 


0.04 


Rare 


15 


11000 


0.004 


0.002 



Modified from Bertioli et al. (1995). 
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at the 5' end, the final pool of random cDNA fragments is a PCR-renewable cDNA 
population which is representative of the expressed gene pool and can be used to 
synthesize sense RNA for use as driver material. Furthermore, if the final pool of 
random cDNA fragments is reamplified using biotinylated T7 primer and random 
hexamer, the product can be captured with streptavidin beads and the antisense 
strand eluted for use as tester. Since both target and driver can be generated from 
the same DROP product, subtraction can be performed in both directions (i.e. for 
up- and down-regulated species) between two different DROP products. 

Representational Difference Analysis ( RDA ) 

RDA of cDNA (Hubank and Schatz 1994) is an extension of the technique 
originally applied to genomic DNA as a means of identifying differences between 
two complex genomes (Lisitsyn et al. 1993). It is a process of subtraction and 
amplification involving subtractive hybridization of the tester in the presence of 
excess driver. Sequences in the tester that have homologues in the driver are 
rendered unamplifiable, whereas those genes expressed only in the tester retain the 
ability to be amplified by PCR. The procedure is shown schematically in figure 4. 

In essence, the driver and tester mRN A populations are first converted to cDNA 
and amplified by PCR following the ligation of an adaptor. The adaptors are then 
removed from both populations and a new (different) adaptor ligated to the 
amplified tester population only. Driver and tester populations are next melted and 
hybridized together in a ratio of 100:1. Following hybridization, only tester : tester 
homohybrids have 5 'adaptors at each end of the DNA duplex and can, thus, be filled 
in at both 3 'ends. Hence, only these molecules are amplified exponentially during 
the subsequent PCR step. Although tester : driver heterohybrids are present, they 
only amplify in a linear fashion, since the strand derived from the driver has no 
adaptor to which the primer can bind. Driver: driver heterohybrids have no 
adaptors and, therefore, are not amplified. Single stranded molecules are digested 
with mung bean nuclease before a further PCR-enrichment of the tester: tester 
homohybrids. The adaptors on the amplified tester population are then replaced and 
the whole process repeated a further two or three times using an increasing excess of 
driver (Hubank and Shatz used a tester :driver ratio of 1:400, 1:80000 and 
1:800000 for the second, third and fourth hybridizations, respectively). Different 
adaptors are ligated to the tester between successive rounds of hybridization and 
amplification to prevent the accumulation of PCR products that might interfere with 
subsequent amplifications. The final display is a series of differentially expressed 
gene products easily observable on an ethidium bromide gel. 

The main advantages of RDA are that it offers a reproducible and sensitive 
approach to the analysis of differentially expressed genes. Hubank and Schatz (1994) 
reported that they were able to isolate genes that were differentially expressed in 
substantially less than 1% of the cells from which the tester is derived. Perhaps the 
main drawback is that multiple rounds of ligation, hybridization, amplifiation and 
digestion are required. The procedure is, therefore, lengthier than many other 
differential display approaches and provides more opportunity for operator-induced 
error to occur. Although the generation of false positives has been noted, this has 
been solved to some degree by O'Neill and Sinclair (1997) through the use of HPLC- 
purified adaptors. These are free of the truncated adaptors which appear to be a 
major source of the false positive bands. A very similar technique to RDA, termed 
linker capture subtraction (LCS) was described by Yang and Sytowski (1996). 
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Figure 4. The representational difference analysis (RDA) technique. Driver and tester cDNA are 
digested with a 4-cutter restriction enzyme such as DpnW. The 1 st set of 12/24 adaptor strands 
(oligonucleotides) are ligated to each other and the digested cDNA products. The 12mer is 
subsequently melted away and the 3'ends filled in using Taq DNA polymerase. Each cDNA 
population is then amplified using PCR, following which the 1 st set of adaptors is removed with 
DpnW. A second set of 12/24 adaptor strands is then added to the amplified tester cDNA 
population, after which the tester is hybridized against a large excess of driver. The 12mer 
adaptors are melted and the 3 'ends filled in as before. PCR is carried out with primers identical 
to the new 24mer adaptor. Thus, the only hybridization products which are exponentially 
amplified are those which are tester: tester combinations. Following PCR, ssDNA products are 
removed with mung bean nuclease, leaving the 'first difference product'. This is digested and a 
third set of 12/24 adaptors added before repeating the subtraction process from the hybridization 
stage. The process is repeated to the 3 rd or 4 lh difference product, as described by Lisitsyn et al. 
(1993) and Hubank and Schatz (1994). 
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Suppression PCR Subtractive Hybridization (SSH) 

The most recent adaptation of the SH approach to differential expression 
analysis was first described by Diatchenko et ah (1996) and Gurskaya et al. (1996). 
They reported that a 1000-5000 fold enrichment of rare cDNAs (equivalent to 
isolating mRNAs present at only a few copies per cell) can be obtained without the 
need for multiple hybridizations/subtractions. Instead of physical or chemical 
removal of the common sequences, a PCR-based suppression system is used (see 
figure 5), 

In SSH, excess driver cDNA is added to two portions of the tester cDNA which 
have been ligated with different adaptors. A first round of hybridization serves to 
enrich differentially expressed genes and equalize rare and abundant messages. 
Equalization occurs since reannealing is more rapid for abundant molecules than for 
rarer molecules due to the second order kinetics of hybridization (James and Higgins 
1985). The two primary hybridization mixes are then mixed together in the presence 
of excess driver and allowed to hybridize further. This step permits the annealing of 
single stranded complementary sequences which did not hybridize in the primary 
hybridization, and in doing so generates templates for PCR amplification. Although 
there are several possible combinations of the single stranded molecules present in 
the secondary hybridization mix, only one particular combination (differentially 
expressed in the tester cDNA composed of complimentary strands having different 
adaptors) can amplify exponentially. 

Having obtained the final differential display, two options are available if cloning 
of cDNAs is desired. One is to transform the whole of the final PCR reaction into 
competent cells. Transformed colonies can then be isolated and their inserts 
characterized by sequencing, restriction analysis or PCR. Alternatively, the final 
PCR products can be resolved on a gel and the individual bands excised, reamplified 
and cloned. The first approach is technically simpler and less time consuming. 
However, ligation/transformation reactions are known to be biased towards the 
cloning of smaller molecules, and so the final population of clones will probably not 
contain a representative selection of the larger products. In addition, although 
equalization theoretically occurs, observations in this laboratory suggest that this is 
by no means perfectly accomplished. Consequently, some gene species are present 
in a higher number than others and this will be represented in the final population 
of clones. Thus, in order to obtain a substantial proportion of those gene species that 
actually demonstrate differential expression in the tester population, the number of 
clones that will have to be screened after this step may be substantial. The second 
approach is initially more time consuming and technically demanding. However, it 
would appear to offer better prospects for cloning larger and low abundance gel 
products. In addition, one can incorporate a screening step that differentiates 
different products of different sequences but of the same size (HA-staining, see 
later). In this way, a good idea of the final number of clones to be isolated and 
identified can be achieved. 

An alternative (or even complementary) approach is to use the final differential 
display reaction to screen a cDNA library to isolate full length clones for further 
characterization, or a DNA array (see later) to quickly identify known genes. SSH 
has been used in this laboratory to begin characterization of the short-term gene 
expression profiles of enzyme-inducers such as phenobarbital (Rockett et al. 1997) 
and Wy-14,643 (Rockett et al. unpublished observations). The isolation of 
differentially expressed genes in this manner enables the construction of a fingerprint 
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Figure 5. PCR-select cDNA subtraction. In the primary hybridization, an excess of driver cDNA is 
added to each tester cDNA population. The samples are heat denatured and allowed to hybridize 
for between 3 and 8 h. This serves two purposes : (1 ) to equalize rare and abundant molecules ; and 
(2) to enrich for differentially expressed sequences — cDNAs that are not differentially expressed 
form type c molecules with the driver. In the secondary hybridization, the two primary 
hybridizations are mixed together without denaturing. Fresh denatured driver can also be added 
at this point to allow further enrichment of differentially expressed sequences. Type e molecules 
are formed in this secondary hybridization which are subsequently amplified using two rounds of 
PCR. The final products can be visualized on an agarose gel, labelled directly or cloned into a 
vector for downstream manipulation. As described by Diatchenko et al. (1996) and Gurskaya 
et al. (1996), with permission. 
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Figure 6. Flow diagram showing method used in this laboratory to isolate and identify clones of genes 
which are differentially expressed in rat liver following short term exposure to the enzyme 
inducers, phenobarbital and Wy-14,643. 



of expressed genes which are unique to each compound and time/dose point. Such 
information could be useful in short-term characterization of the toxic potential of 
new compounds by comparing the gene-expression profiles they elicit with those 
produced by known inducers. Figure 6 shows a flow diagram of the method used to 
isolate, verify and clone differentially expressed genes, and figure 7 shows expression 
profiles obtained from a typical SSH experiment. Subsequent sub-cloning of the 
individual bands, sequencing and gene data base interrogation reveals many genes 
which are either up- or down-regulated by phenobarbital in the rat (tables 2 and. 3). 

One of the advantages in using the SSH approach is that no prior knowledge is 
required 6f which specific genes are up/down^regulated subsequent to xenobiotic 
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Figure 7. SSH display patterns obtained from rat liver following 3-day treatment with WY-14,643 or 
phenobarbital. mRNA extracted from control and treated livers was used to generate the 
differential displays using the PCR-Select cDNA subtraction kit (Clontech). Lane: 1 — lkb 
ladder; 2 — genes upregulated following Wy t l 4-643 treatment; 3 — genes downregulated following 
Wy,14— 643 treatment; 4 — genes upregulated following phenobarbital treatment; 5 — genes 
downregulated following phenobarbital treatment; 6 — lkb ladder. Reproduced from Rockett et 
al. (1997), with permission. 

exposure, and an almost complete complement of genes are obtained. For example, 
the peroxisome proliferator and non-genotoxic hepatocarcinogen Wy,14,643, up- 
regulates at least 28 genes and down-regulates at least 15 in the rat (a sensitive 
species) and produces 48 up- and 37 down-regulated genes in the guinea pig, a 
resistant species (Rockett, Swales, Esda and Gibson, unpublished observations). 
One of these genes, CD81, was up-regulated in the rat and down-regulated in the 
guinea pig following Wy-14,643 treatment. CD81 (alternatively named TAPA-1) is 
a widely expressed cell surface protein which is involved in a large number of cellular 
processes including adhesion, activation, proliferation and differentiation (Levy et 
al. 1998). Since all of these functions are altered to some extent in the phenomena 
of hepatomegaly and non-genotoxic hepatocarcinogenesis, it is intriguing, and 
probably mechanistically-relevant, that CD81 expression is differentially regulated 
in a resistant and susceptible species. However, the down-side of this approach is 
that the majority of genes can be sequenced and matched to database sequences, but 
the latter are predominantly expressed sequence tags or genes of completely 
unknown function, thus partially obscuring a realistic overall assessment of the 
critical genes of genuine biological interest. Notwithstanding the lack of complete 
funtional identification of altered gene expression, such gene profiling studies 
essentially provides a * molecular fingerprint ' in response to xenobiotic challenge, 
thereby serving as a mechanistically-relevant platform for further detailed 
investigations. 

Differential Display (DD) 

Originally described as *RNA fingerprinting by arbitrarily primed PCR * (Liang 
and Pardee 1992) this method is now more commonly referred to as 'differential 
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Table 2. G 


enes up-regulated in rat liver following 3 -day exposure to phenobarbital. 


Band number 






(approximate 


Highest sequence 




size in bp) 


similarity 


FASTA-EMBL gene identification 


5 (1300) 


93.5% 


CYP2B1 


7 (1000) 


95.1 % 


Preproalbumin 






Serum albumin mRNA 


8 (950) 


98.3% 


NCl-CGAP-Prl H. sapiens (EST) 


10 (850) 


95.7% 


CYP2B1 


1 1 \0\J\J) 


Clone 1 94.9% 


CYP2B1 




Clone 2 75.3% 


CYP2B2 


12 (750) 


93.8% 


TRPM-2 mRNA 






Sulfated glycoprotein 


15 (600) 


92.9% 


Preproalbumin 






Serum albumin mRNA 


16(55) 


Clone 1 95.2% 


CYP2B1 




Clone 2 93.6% 


Haptoglobulin mRNA partial alpha 


21 (350) 


99.3% 


18S, 5.8S&28S rRNa 


Bands 1-4, 6, 9, 


13, 14, and 17-20 are shown to be false positives by dot blot anaylsis and, therefore, 


are not sequenced. Derived from Rockett et al. (1997). It should be noted that the above genes do not 


represent the complete spectrum of genes which 


are up-regulated in rat liver by phenobarbital, but 


simply represents the genes sequenced and identified to date. 


Table 3. G 


enes down -regulated in rat liver following 3-day exposure to phenobarbital. 


Band number 






(approximate 


Highest sequence 




size in bp) 


similarity 


FASTA-EMBL gene identification 


1 (1500) 


95.3% 


3-oxoacyl-CoA thiolase 


2 (1200) 


92.3% 


Hemopoxin mRNA 


3 (1000) 


91.7% 


Alpha-2u-globulin mRNA 


7 (700) 


Clone 1 77.2% 


M .musculus CI inhibitor 




Clone 2 94.5% 


Electron transfer flavoprotein 




Clone 3 91.0% 


M. musculus Topoisom erase l (Topo 1) 


8 (650) 


Clone 1 86.9% 


Soares 2NbMT M. musculus (EST) 




Clone 2 96.2% 


Alpha-2u-globulin (s-type) mRNA 


9 (600) 


Clone 1 86.9% 


Soares mouse NML M. musculus (EST) 




Clone 2 82.0% 


Soares p3NMF 19.5 M. musculus (EST) 


10 (550) 


73.8% 


Soares mouse NML M. musculus (EST) 


11 (525) 


95.7% 


NCl-CGAP-Prl H. sapiens (EST) 


12 (375) 


100.0% 


Ribosomal protein 


13 (23) 


Clone 1 97.2% 


Soares mouse embryo NbME135 (EST) 




Clone 2 100.0% 


Fibrinogen B-beta-chain 




Clone 3 100.0% 


Apolipoprotein E gene 


14 (170) 


96.0% 


Soares p3NM F19.5 M . musculus (EST) 


15 (140) 


97.3% 


Stratagene mouse testis (EST) 


Others: (300) 


96.7% 


R. norvegicus RASP 1 mRNA 


(275) 


93.1% 


Soares mouse mammary gland (EST) 



EST = Expressed sequence tag. Bands 4-6 were shown to be false positives by dot blot analysis and, 
therefore, were not sequenced. Derived from Rockett et al. (1997). It should be noted that the above genes 
do not represent the complete spectrum of genes which are down-regulated in rat liver by phenobarbital, 
but simiply represents the genes sequenced and identified to date. 



display' (DD). In this method, all the mRNA species in the control and treated cell 
populations are amplified in separate reactions using reverse transcriptase-PCR 
(RT-PCR). The products are then run side-by-side on sequencing gels. Those 
bands which are present in one display only, or which are much more intense in one 
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display compared to the other, are differentially expressed and may be recovered for 
further characterization. One advantage of this system is the speed with which it can 
be carried out — 2 days to obtain a display and as little as a week to make and identify 
clones. 

Two commonly used variations are based on different methods of priming the 
reverse transcription step (figure 8). One is to use an oligo dT with a 2-base 'anchor' 
at the 3'-end, e.g. 5' (dT H )CA 3' (Liang and Pardee 1992). Alternatively, an 
arbitrary primer may be used for 1st strand cDNA synthesis (Welsh et al. 1992). 
This variant of RNA fingerprinting has also been called 'RAP* (RNA Arbitrarily 
Primed)-PCR. One advantage of this second approach is that PCR products may be 
derived from anywhere in the RNA, including open reading frames. In addition, it 
can be used for mRNAs that are not polyadenylated, such as many bacterial mRNAs 
(Wong and McClelland 1994). In both cases, following reverse transcription and 
denaturation, second strand cDNA synthesis is carried out with an arbitrary primer 
{arbitrary primers have a single base at each position, as compared to random 
primers, which contain a mixture of all four bases at each position). The resulting 
PCR, thus, produces a series of products which, depending on the system (primer 
length and composition, polymerase and gel system), usually includes 50-100 
products per primer set (Band and Sager 1989). When a combination of different 
dT-anchors and arbitrary primers are used, almost all mRNA species from a cell can 
be amplified. When the cDNA products from two different populations are analysed 
side by side on a polyacrylamide gel, differences in expression can be identified and 
the appropriate bands recovered for cloning and further analysis. 

Although DD is perhaps the most popular approach used today for identifying 
differentially expressed genes, it does suffer from several perceived disadvantages: 

(1) It may have a strong bias towards high copy number mRNAs (Bertioli et al. 
1995), although this has been disputed (Wan et al. 1 996) and the isolation of very 
low abundance genes may be achieved in certain circumstances (Guimeraes et 
al. 1995a). 

(2) The cDNAs obtained often only represent the extreme 3' end of the mRNA 
(often the 3 '-untranslated region), although this may not always be the case 
(Guimeraes et al. 1995a). Since the V end is often not included in Genbank and 
shows variation between organisms, cDNAs identified by DD cannot always be 
matched with their genes, even if they have been identified. 

(3) The pattern of differential expression seen on the display often cannot be 
reproduced on Northern blots, with false positives arising in up to 70% of cases 
(Sun et al. 1994). Some adaptations have been shown to reduce false positives, 
including the use of two reverse transcriptases (Sung and Denman 1997), 
comparison of uninduced and induced cells over a time course (Burn et al. 1994) 
and comparison of DDPCR-products from two uninduced and two induced 
lines (Sompayrac et al. 1995). The latter authors also reported that the use of 
cytoplasmic RNA rather then total RNA reduces false positives arising from 
nuclear RNA that is not transported to the cytoplasm. 

Further details of the background, strengths and weaknesses of the DD 
technique can be obtained from a review by McClelland et al. (1996) and from 
articles by Liang et al. (1995) and Wan et al. (1996). 
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Figure 8. Two approaches to differential display (DD) analysis. 1 st strand synthesis can be carried out 
either with a polydT M NN. primer (where N = G, C or A) or with an arbitrary primer. The use of 
different combinations of G , C and A to anchor the first strand polydT primer enables the priming 
of the majority of polyadenylated mRN As. Arbitrary primers may hybridize at none, one or more 
places along the length of the mRNA, allowing 1 st strand cDNA synthesis to occur at none, one 
or more points in the same gene. In both cases, 2 nd strand synthesis is carried out with an arbitrary 
primer. Since these arbitrary primers for the 2 nd strand may also hybridize to the 1 st strand cDNA 
in a number of different places, several different 2 nd strand products may be obtained from one 
binding point of the 1 st strand primer. Following 2 nd strand synthesis, the original set of primers 
is used to amplify the second strand products, with the result that numerous gene sequences are . 
amplified. 



Restriction endonuclease-facilitated analysis of gene expression 

Serial Analysis of Gene Expression (SAGE) 

A more recent development in the field of differential display is SAGE analysis 
(Velculescu et al. 1995). This method uses a different approach to those discussed so 
far and is based on two principles. Firstly, in more than 95% of cases, short 
nucleotide sequences ('tags') of only nine or 10 base pairs provide sufficient 
information to identify their gene of origin. Secondly, concatonation (linking 
together in a series) of these tags allows sequencing of multiple cDNAs within a 
single clone. Figure 9 shows a schematic representation of the SAGE process. In this 
procedure, double stranded cDNA from the test cells is synthesized with a 
biotinylated polydT primer. Following digestion with a commonly cutting (4bp 
recognition sequence) restriction enzyme ('anchoring enzyme'), the 3' ends of the 
cDNA population are captured with streptavidin beads. The captured population is 
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split into two and different adaptors ligated to the 5 'ends of each group . Incorporated 
into the adaptors is a recognition sequence for a type IIS restriction enzyme— one 
which cuts DNA at a defined distance (< 20 bp) from its recognition sequence. 
Hence, following digestion of each captured cDNA population with the IIS enzyme, 
the adaptors plus a short piece of the captured cDNA are released. The two 
populations are then ligated and the products amplified. The amplified products are 
cleaved with the original anchoring enzyme, religated (concatomers are formed in 
the process) and cloned. The advantage of this system is that hundreds of gene tags 
can be identified by sequencing only a few clones. Furthermore, the number of times 
a given transcript is identified is a quantitative measurement of that gene's 
abundance in the original population, a feature which facilitates identification of 
differentially expressed genes in different cell populations. 

Some disadvantages of SAGE analysis include the technical difficulty of the 
method, a large amount of accurate sequencing is required, biased towards abundant 
mRNAs, has not been validated in the pharmaco/toxicogenomic setting and has 
only been used to examine well known tissue differences to date. 

Gene Expression Fingerprinting ( GEF ) 

A different capture/restriction digest approach for isolating differentially 
expressed genes has been described by Ivanova and Belyavsky (1995). In this 
method, RNA is converted to cDNA using biotinylated oligo(dT) primers. The 
cDNA population is then digested with a specific endonuclease and captured with 
magnetic strep tavidin microbeads to facilitate removal of the unwanted 5' digestion 
products. The use of restricted 3 '-ends alone serves to reduce the complexity of the 
cDNA fragment pool and helps to ensure that each RNA species is represented by 
not more than one restriction product. An adaptor is ligated to facilitate subsequent 
amplification of the captured population. PCR is carried out with one adaptor- 
specific and one biotinylated polydT primer. The reamplified population is 
recaptured and the non-biotinylated strands removed by alkaline dissociation. The 
non-biotinylated strand is then resynthesized using a different adaptor-specific 
primer in the presence of a radiolabeled dNTP. The labelled immobilized 3 ' cDNA 
. ends are next sequentially treated with a series of different restriction endonucleases 
and the products from each digestion analysed by PAGE. The result is a fingerprint 
composed of a number of ladders (equal to the number of sequential digests used). 
By comparing test versus control fingerprints, it is possible to identify differentially 
expressed products which can then be isolated from the gel and cloned. The 
advantages of this procedure are that it is very robust and reproducible, and the 
authors estimate that 80-93% of cDNA molecules are involved in the final 
fingerprint. The disadvantage is that polyacrylamide gels can rarely resolve more 
than 300-400 bands, which compares poorly to the 1000 or more which are 
estimated to be produced in an average experiment. The use of 2-D gels such as 
those described by Uitterlinden et al. (1989) and Hatada et al. (1991) may help to 
overcome this problem. 

A similar method for displaying restriction endonuclease, fragments was later 
described by Prashar and Weissman (1996). However, instead of sequential 
digestion of the immobolized 3'-terminal cDNA fragments, these authors simply 
compared the profiles of the control and treated populations without further 
manipulation. 
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Figure 9. Serial analysis of gene expression (SAGE) analysis. cDN A is cleaved with an anchoring enzyme 
(AE)and the 3'ends captured using streptavidin beads. ThecDNA pool is divided in half and each 
portion ligated to a different linker, each containing a type IIS restriction site (tagging enzyme, 
TE), Restriction with the type IIS enzyme releases the linker plus a short length of cDNA 
(XXXXX and OOOOO indicate nucleotides of different tags). The two pools of tags are then 
ligated and amplified using linker-specific primers. Following PCR, the products are cleaved with 
the AE and the ditags isolated from the linkers using PAGE. The ditags are then ligated (during 
. which process, concatenization occurs) and cloned into a vector of choice for sequencing. After 
Velculescu et al. (1995), with permission. 
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DNA arrays 

'Open 1 differential display systems are cumbersome in that it takes a great deal 
of time to extract and identify candidate genes and then confirm that they are indeed 
up- or down-regulated in the treated compared to the control tissue. Normally, the 
latter process is carried out using Northern blotting or RT-PCR. Even so, each of 
the aforementioned steps produce a bottleneck to the ultimate goal of rapid analysis 
of gene expression. These problems will likely be addressed by the development of 
so-called DNA arrays (e.g. Gress et al. 1992, Zhao et al. 1995, Schena et al. 1996), 
the introduction of which has signalled the next era in differential gene expression 
analysis. DNA arrays consist of a gridded membrane or glass * chips* containing 
hundreds or thousands of DNA spots, each consisting of multiple copies of part of 
a known gene. The genes are often selected based on previously proven involvement 
in oncogenesis, cell cycling, DNA repair, development and other cellular processes. 
They are usually chosen to be as specific as possible for each gene and ahimalspecies. 
Human and mouse arrays are already commercially available and a few companies 
will construct a personalized array to order, for example Clontech Laboratories and 
Research Genetics Inc. The technique is rapid in that hundreds or even thousands 
of genes can be spotted on a single array, and that mRNA /cDNA from the test 
populations can be labelled and used directly as probe. When analysed with 
appropriate hardware and software, arrays offer a rapid and quantitative means to 
assess differences in gene expression between two cell populations. Of course, there 
can only be identification and quantitation of those genes which are in the array 
(hence the term * closed* system). Therefore, one approach to elucidating the 
molecular mechanisms involved in a particular disease/development system may be 
to combine an open and closed system — a DNA array to directly identify and 
quantitate the expression of known genes in mRNA populations, and an open 
system such as SSH to isolate unknown genes which are differentially expressed. 

One of the main advantages of DNA arrays is the huge number of gene fragments 
which can be put on a membrane — some companies have reported gridding up to 
60000 spots on a single glass 'chip* (microscope slide). These high density chip- 
based micro-arrays will probably become available as mass-produced off-the-shelf 
items in the near future. This should facilitate the more rapid determination of 
differential expression in time and dose-response experiments. Aside from their 
high cost and the technical complexities involved in producing and probing DNA 
arrays, the main problem which remains, especially with the newer micro-array 
(gene-chip) technologies, is that results are often not wholly reproducible between 
arrays. However, this problem is being addressed and should be resolved within the 
next few years. 



EST databases as a means to identify differentially expressed genes 

Expressed sequence tags (ESTs) are partial sequences of clones obtained from 
cDNA libraries. Even though most ESTs have no formal identity (putative 
identification is the best to be hoped for), they have proven to be a rapid and efficient 
means of discovering new genes and can be used to generate profiles of gene- 
expression in specific cells. Since they were first described by Adams et al. (1991), 
there has been a huge explosion in EST production and it is estimated that there are 
now well over a million such sequences in the public domain, representing over half 
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of all human genes (Hillier et aL 1996). This large number of freely available 
sequences (both sequence information and clones are normally available royalty-free 
from the originators) has enabled the development of a new approach towards 
differential gene expression analysis as described by Vasmatzis et aL (1998). The 
approach is simple in theory : EST databases are first searched for genes that have a 
number of related EST sequences from the target tissue of choice, but none or few 
from non-target tissue libraries. Programmes to assist in the assembly of such sets of 
overlapping data may be developed in-house or obtained privately or from the 
internet. For example, the Institute for Genomic Research (TIGR, found at 
http:/ /www. tigr.org) provides many software tools free of charge to the scientific 
community. Included amongst these is the TIGR assembler (Sutton et aL 1995), a 
tool for the assembly of large sets of overlapping data such as ESTs, bacterial 
artificial chromosomes (BAC)s, or small genomes. Candidate EST clones repre- 
senting different genes are then analysed using RNA blot methods for size and tissue 
specificity and, if required, used as probes to isolate and identify the full length 
cDNA clone for further characterization. In practice however, the method is rather 
more involved, requiring bioinformatic and computer analysis coupled with 
confirmatory molecular studies. Vasmatzis et aL (1998) have described several 
problems in this fledgling approach, such as separating highly homologous 
sequences derived from different genes and an overemphasis of specificity for some 
EST sequences. However, since these problems will largely be addressed by the 
development of more suitable computer algorithms and an increased completeness 
of the EST database, it is likely that this approach to identifying differentially 
expressed genes may enjoy more patronage in the future. 



Problems and potential of differential expression techniques 

The holistic or single cell approach ? , 

When working with in vivo models of differential expression, one of the first 
issues to consider must be the presence of multiple cell types in any given specimen. 
For example, a liver sample is likely to contain not only hepatocytes, but also 
(potentially) Ito cells, bile ductule cells, endothelial cells, various immune cells (e.g. 
lymphocytes, macrophages and Kupffer cells) and fibroblasts. Other tissues will 
each have their own distinctive cell populations. Also, in the case of neoplastic tissue, 
there are almost always normal, hyperplastic and/or dysplastic cells present in a 
sample. One must, therefore, be aware that genes obtained from a differential 
display experiment performed on an animal tissue model may not necessarily arise 
exclusively from the intended * target' cells, e.g. hepatocytes/neoplastic cells. If 
appropriate, further analyses using immunohistochemistry, in situ hybridization or 
in situ RT-PCR should be used to confirm which cell types are expressing the 
gene(s) of interest. This problem is probably most acute for those studying the 
differential expression of genes in the development of different cell types, where 
there is a need to examine homologous cell populations. The problem is now being 
addressed atthe National Cancer Institute (Bethesda, MD, USA) where new micro- 
disection techniques have been employed to assist in their gene analysis programme, 
the Cancer Genome Anatomy Project (CG AP) (For more information see web site : 
http :/ /vww. ncbi.nlm.nih.gov/ncicgap /intro.html). There are also separation tech- 
niques available that utilise cell-specific antigens as a means to isolate target cells, 
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e.g. fluorescence activated cell sorting (FACS) (Dunbar et al. 1998, Kas-Deelen et 
al. 1998) and magnetic bead technology (Richard et al. 1998, Rogler et al. 1998). 

However, those taking a holistic approach may consider this issue unimportant. 
There is an equally appropriate view that all those genes showing altered expression 
within a compromized tissue should be taken into consideration. After all, since all 
tissues are complex mixes of different, interacting cell types which intimately 
regulate each other's growth and development, it is clear that each cell type could in 
some way contribute (positively or negatively) towards the molecular mechanisms 
which lie behind responses to external stimuli or neoplastic growth. It is perhaps 
then more informative to carry out differential display experiments using in vivo as 
opposed to in vitro models, where uniform populations of identical cells probably 
represent a partial, skewed or even inaccurate picture of the molecular changes that 
occur. 

The incidence and possible implications of inter-individual biological variation 
should be considered in any approach where whole animal models are being used. It 
is clear that individuals (humans and animals) respond in different ways to identical 
stimuli. One of the best characterized examples is the debrisoquine oxidation 
polymorphism, which is mediated by cytochrome CYP2D6 and determines the 
pharmacokinetics of many commonly prescribed drugs (Lennard 1993, Meyer and 
Zanger 1997). The reasons for such differences are varied and complex, but allelic 
variations, regulatory region polymorphisms and even physical and mental health 
can all contribute to observed differences in individual responses. Careful thought 
should, therefore, be given to the specific objectives of the study and to the possible 
value of pooling starting material (tissue/mRNA). The effect of this can be 
beneficial through the ironing out of exaggerated responses and unimportant minor 
fluctuations of (mechanistically) irrelevant genes in individual animals, thus 
providing a clearer overall picture of the general molecular mechanisms of the 
response. However, at the same time such minor variations may be of utmost 
importance in deciding the ability of individual animals to succumb to or resist the 
effects of a given chemical/disease. 



How efficient are differential expression techniques at recovering a high percentage of 
differentially expressed genes ? 

- A number of groups have produced experimental data suggesting that mam- 
malian cells produce between 8000-15 000 different mRNA species at any one time 
(Mechler and Rabbitts 1981, Hedrick et al. 1984, Bravo 1990), although figures as 
high as 20-30000 have also been quoted (Axel et al. 1976). Hedrick et al. (1984) 
provided evidence suggesting that the majority of these belong to the rare abundance 
class. A breakdown of this abundance distribution is shown in table 1. 

When the results of. differential display experiments have been compared with 
data obtained previously using other methods, it is apparent that not all differentially 
expressed mRNAs are represented in the final display. In particular, rare messages 
(which, importantly, often include regulatory proteins) are not easily recovered 
using differential display systems. This is a major shortcoming, as the majority of 
mRNA species exist at levels of less than 0.005% of the total population (table 1). 
Bertioli et al. (1995) examined the efficiency of DD templates (heterogeneous 
mRNA populations) for recovering rare messages and were unable to detect mRNA 
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species present at less than 1.2% of the total mRNA population — equivalent to an 
intermediate or abundant species. Interestingly, when simple model systems (single 
target only) were used instead of a heterogeneous mRNA population, the same 
primers could detect levels of target mRNA down to 10000X smaller. These results 
are probably best explained by competition for substrates from the many PCR 
products produced in a DD reaction. 

The numbers of differentially expressed mRNAs reported in the literature using 
various model systems provides further evidence that many differentially expressed 
mRNAs are not recovered. For example, DeRisi et al. (1997) used DNA array 
technology to examine gene expression in yeast following exhaustion of sugar in the 
medium, and found that more than 1700 genes showed a change in expression of at 
least 2-fold. In light of such a finding, it would not be unreasonable to suggest that 
of the 8000-15 000 different mRNA species produced by any given mammalian cell, 
up to 1000 or more may show altered expression following chemical stimulation. 
Whilst this may be an extreme figure, it is known that at least 100 genes are 
activated /upregulated in Jurkat (T-) cells following IL-2 stimulation (Ullman et al. 
1990). In addition, Wan et aL (1996) estimated that interferon- y-stimulated HeLa 
cells differentially express up to 433 genes (assuming 24000 distinct mRNAs 
expressed by the cells). However, there have been few publications documenting 
anywhere near the recovery of these numbers. For example, in using DD to compare 
normal and regenerating mouse liver, Bauer et al. (1993) found only 70 of 38000 
total bands to be different. Of these, 50% (35 genes) were shown to correspond to 
differentially expressed bands. Chen et al. (1996) reported 10 genes upregulated in 
female rat liver following ethinyl estradiol treatment. McKenzie and Drake (1997) 
identified 14 different gene products whose expression was altered by phorbol 
myristate acetate (PMA, a tumour promoter agent) stimulation of a human 
myelomonocytic cell line. Kilty and Vickers (1997) identified 10 different gene 
products whose expression was upregulated in the peripheral blood leukocytes of 
allergic disease sufferers. Linskens et al. (1995) found 23 genes differentially 
expressed between young and senescent fibroblasts. Techniques other than DD 
have also provided an apparent paucity of differentially expressed genes. Using SH 
for example, Cao et al. (1997) found 15 genes differentially expressed in colorectal 
cancer compared to normal mucosal epithelium. Fitzpatrick et al. (1995) isolated 17 
genes upregulated in rat liver following treatment with the peroxisome proliferator, 
clofibrate; Philips et al. (1990) isolated 12 cDNA clones which were upregulated in 
highly metastatic mammary adenocarcinoma cell lines compared to poorly meta- 
static ones. Prashar and Weissman (1996) used 3' restriction fragment analysis and 
identified approximately 40 genes showing altered expression within 4 h of 
activation of Jurkat T-cells. Groenink and Leegwater (1996) analysed 27 gene 
fragments isolated using SSH of delayed early response phase of liver regeneration 
and found only 12 to be upregulated. 

In the laboratory, SSH was used to isolate up to 70 candidate genes which appear 
to show altered expression in guinea pig liver following short-term treatment with 
the peroxisome proliferator, WY-14,643 (Rockett, Swales, Esdaile and Gibson, 
unpublished observations). However, these findings have still to be confirmed by 
analysis of the extracted tissue mRNA for differential expression of these sequences. 

Whilst the latest differential display technologies are purported to include design 
and experimental modifications to overcome this lack of efficiency (in both the total 
number of differentially expressed genes recovered and the percentage that are true 
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positives), it is still not clear if such adaptations are practically effective — proving 
efficiency by spiking with a known amount of limited numbers of artificial 
construct(s) is one thing, but isolating a high percentage of the rare messages already 
present in an mRNA population is another. Of course, some models will genuinely 
produce only a small number of differentially expressed genes. In addition, there are 
also technical problems that can reduce efficiency. For example, mRNAs may have 
. an unusual primary structure that effectively prevents their amplification by PCR- 
based systems. In addition, it is known that under certain circumstances not all 
mRNAs have 3 'poly A sites. For example, during Xenopus development, deadenyl- 
ation is used as a means to stabilize RNAs (Voeltz and Steitz 1998), whilst 
preferential deadenylation may play a role in regulating Hsp70 (and perhaps, 
therefore, other stress protein) expression in Drosophila (Dellavalle et al. 1994); The 
presence of deadenylated mRNAs would clearly reduce the efficiency of systems 
utilizing a polydT reverse transcription step. The efficiency of any system also 
depends on the quality of the starting material. All differential display techniques 
use mRNA as their target material. However, it is difficult to isolate mRNA that is 
completely free of ribosomal RNA. Even if polydT primers are used to prime first 
strand cDNA synthesis, ribosomal RNA is often transcribed to some degree 
(Clontech PCR-Select cDNA Subtraction kit user manual). It has been shown, at 
least in the case of SSH, that a high rRNA:mRNA ratio can lead to inefficient 
subtractive hybridization (Clontech PCR-Select cDNA Subtraction kit user 
manual), and there is no reason to suppose that it will not do likewise in other SH 
approaches. Finally, those techniques that utilise a presubtraction amplification step 
(e.g. RDA) may present a skewed representation since some sequences amplify 
better than others. 

Of course, probably the most important consideration is the temporal factor. It 
is clear that any given differential display experiment can only interrogate a cell at 
one point in time. It may well be that a high percentage of the genes showing altered 
expression at that time are obtained. However, given that disease processes and 
responses to environmental stimuli involve dynamic cascades of signalling, 
regulation, production and action, it is clear that all those genes which are switched 
on/off at different times will not be recovered and, therefore, vital information may 
well be missed. It is, therefore, imperative to obtain as much information about the 
model system beforehand as possible, from which a strategy can be derived for 
targeting specific time points or events that are of particular interest to the 
investigator. One way of getting round this problem of single time point analysis is 
to conduct the experiment over a suitable time course which, of course, adds 
substantially to the amount of work involved. 



How sensitive are differential expression technologies? 

There has been little published data that addresses the issue of how large the 
change in expression must be for it to permit isolation of the gene in question with 
the various differential expression technologies. Although the isolation of genes 
whose expression is changed as little, as 1.5-fold has been reported using SSH 
(Groenink and Leegwater 1996), it appears that those demonstrating a change in 
excess of 5-fold are more likely to be picked up. Thus, there is a 'grey zone' 
in between where small changes could fade in and out of isolation between 
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experiments and animals. DD, on the other hand, is not subject to this grey 
zone since, unlike SH approaches, it does not amplify the difference in expression 
between two samples. Wan et al. (1996) reported that differences in expression of 
twofold or more are detectable using DD. 

Resolution and visualization of differential expression products 

It seems highly improbable with current technology that a gel system could be 
developed that is able to resolve all gene species showing altered expression in any 
given test system (be it SH- or DD-based). Polyacrylamide gel electrophoresis 
(PAGE) can resolve size differences down to 0.2% (Sambrook et al. 1989) and are 
used as standard in DD experiments. Even so, it is clear that a complex series of gene 
products such as those seen in a DD will contain unresolvable components. Thus, 
what appears to be one band in a gel may in fact turn out to be several. Indeed, it has 
been well documented (Mathieu-Daude et al. 1996, Smith et al. 1997) that a single 
band extracted from a DD often represents a composite of heterogeneous products, 
and the same has been found for SSH displays in this laboratory (Rockett et al. 
1997). One possible solution was offered by Mathieu-Daude et al. (1996), who 
extracted and reamplified candidate bands from a DD display and used single strand 
conformation polymorphism (SSCP) analysis to confirm which components 
represented the truly differentially expressed product. 

Many scientists often try to avoid the use of PAGE where possible because it is 
technically more demanding than agarose gel electrophoresis (AGE). Unfortunately, 
high resolution agarose gels such as Metaphor (FMC, Lichfield, UK) and AquaPor 
HR (National Diagnostics, Hessle, UK), whilst easier to prepare and manipulate 
than PAGE, can only separate DNA sequences which differ in size by around 
1.5-2% (15-20 base pairs for a 1Kb fragment). Thus, SSH, RDA or other such 
products which differ in size by less than this amount are normally not resolvable. 
However, a simple technique does in fact exist for increasing the resolving power of 
AGE — the inclusion of H A-red (10-phenyl neutral red-PEG ligarid) or HA-yellow 
(bisbenzamide-PEG ligand) (Hanse Analytik GmbH, Bremen, Germany) in a 
gel separates identical or closely sized products on base content. Specifically, 
HA-red and -yellow selectively bind to GC and AT DNA motifs, respectively 
(Wawer et al. 1995, Hanse Analytik 1997, personal communication). Since both 
HA-stains possess an overall positive charge, they migrate towards the cathode 
when an electric field is applied. This is in direct opposition to DNA, which 
is negatively charged and, therefore, migrates towards the anode. Thus, if two 
DNA clones are identical in size (as perceived on a standard high resolution 
agarose gel), but differ in AT/GC content, inclusion of a HA-dye in the gel 
will effectively retard the migration of one of the sequences compared to the 
other, effectively making it apparently larger and, thus, providing a means of 
differentiating between the two. The use of HA-red has been shown to resolve 
sequences with an AT variation of less than 1 % (Wawer et al. 1995), whilst Hanse 
Analytik have reported that HA staining is so sensitive that in one case it was used 
to distinguish two 567bp sequences which differed by only a single point mutation 
(Hanse Analytik 1996, personal communication). Therefore, if one wishes to check 
whether all the clones produced from a specific band in a differential display 
experiment are derived from the same gene species, a small amount of reamplified 
or digested clone can be run on a standard high resolution gel, and a second aliquot 
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Figure 10. Discrimination of clones of identical/nearly identical size using HA-red. Bands of decreasing 
size (1-5) were extracted from the final display of a suppression subtractive hybridization 
experiment and cloned. Seven colonies were picked at random from each cloned band and their 
inserts amplified using PCR. The products were run on two gels, (A) ahigh resolution 2% agarose 
gel, and (B) a high resolution 2% agarose gel containing 1 U/ml HA-red. With few exceptions, all 
the clones from each band appear to be the same size (gel A). However, the presence of HA-red 
(gel B), which separates identically-sized DNA fragments based on the percentage of GC within 
the sequence, clearly indicates the presence of different gene species within each band. For 
example, even though all five re-amplified clones of band 1 appear to be the same size, at least four 
different gene species are represented. 

in a similar gel containing one of the HA-stains. The standard gel should indicate 
any gross size differences, whilst the HA-stained gel should separate otherwise 
unresolvable species (on standard AGE) according to their base content. Geisinger 
et al. (1997) reported successful use of this approach for identifying DD -derived 
clones. Figure 10 shows such an experiment carried out in this laboratory on clones 
obtained from a band extracted from an SSH display. 

An alternative approach is to carry out a 2-D analysis of the differential display 
products. In this approach, size-based separation is first carried out in a standard 
agarose gel. The gel slice containing the display is then extracted and incorporated 
in to a HA gel for resolution based on AT/GC content. 

Of course, one should always consider the possibility of there being different 
gene species which are the same size and have the same GC/AT content. However, 
even these species are not unresolvable given some effort — again, one might use 
SSCP, or perhaps a denaturing gradient gel electrophoresis (DGGE) or temperature 
gradient field electrophoresis (TGGE) approach to resolve the contents of a band, 
either directly on the extracted band (Suzuki et al. 1991) or on the reamplified 
product. 

The requirement of some differential display techniques to visualize large 
numbers of products (e.g. DD and GEF) can also present a problem in that, in terms 
of numbers, the resolution of PAGE rarely exceeds 300-400 bands. One approach to 
overcoming this might be to use 2-D gels such as those described by Uitterlinden et 
al. (1989) and Hatada et al. (1991). 
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Extraction of differentially expressed bands from a gel can be complex since, in 
some cases (e.g. DD, GEF), the results are visualized by autoradiographic means, 
such that precise overlay of the developed film on the gel must occur if the correct 
band is to be extracted for further analysis. Clearly, a misjudged extraction can 
account for many man-hours lost. This problem, and that of the use of radioisotopes, 
has been addressed by several groups. For example, Lohmann et al. (1995) 
demonstrated that silver staining can be used directly to visualize DD bands in 
horizontal PAGs. An et al. (1996) avoided the use of radioisotopes by transferring a 
small amount (20-30%) of the DNA from their DD to a nylon membrane, and 
visualizing the bands using chemiluminescent staining before going back to extract 
the remaining DNA from the gel. Chen and Peck (1996) went one step further and 
transferred the entire DD to a nylon membrane. The DNA bands were then 
visualized using a digoxigenin (DIG) system (DIG was attached to the polydT 
primers used in the differential display procedure). Differentially expressed bands 
were cut from the membrane and the DNA eluted by washing with PCR buffer prior 
to reamplification. 

One of the advantages of using techniques such as SSH and RD A is that the final 
display can be run on an agarose gel and the bands visualized with simple ethidium 
bromide staining. Whilst this approach can provide acceptable results, overstaining 
with SYBR Green I or SYBR Gold nucleic acid stains (FMC) effectively enhances 
the intensity and sharpness of the bands. This greatly aids in their precise extraction 
and often reveals some faint products that may otherwise be overlooked. Whilst 
differential displays stained with SYBR Green I are better visualized using short 
wavelength UV (254 nm) rather than medium wavelength (306 nm), the shorter 
wavelength is much more DNA damaging. In practice, it takes only a few seconds 
to damage DNA extracted under 254 nm irradiation, effectively preventing 
reamplification and cloning. The best approach is to overstain with SYBR Green I 
and. extract bands under a medium wavelength UV transillumination. 

The possible use of ' microfingerprinting ' to reduce complexity 

Given the sheer number of gene products and the possible complexity of each 
band, an alternative approach to rapid characterization may be to use an enhanced 
analysis of a small section of a differential display — a * sub-fingerprint* or ' micro- 
fingerprint \ In this case, one could concentrate on those bands which only appear 
in a particular chosen size region. Reducing the fingerprint in this way has at least 
two advantages. One is that it should be possible to use different gel types, 
concentrations and run times tailored exactly to that region. Currently, one might 
run products from 100-3000 +bp on the same gel, which leads to compromize in the 
gel system being used and consequently to suboptimal resolution, both in terms of 
size and numbers, and can lead to problems in the accurate excision of individual 
bands. Secondly, it may be possible to enhance resolution by using a 2-D analysis 
using a HA-stain, as described earlier. In summary, if a range of gene product sizes 
is carefully chosen to included certain ' relevant ' genes, the 2-D system standardized, 
and appropriate gene analysis used, it may be possible to develop a method for the 
early and rapid identification of compounds which have similar or widely different 
cellular effects. If the prognosis for exposure to one or more other chemicals which 
display a similar profile is already known, then one could perhaps predict similar 
effects for any new compounds which show a similar micro-fingerprint. 
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An alternative approach to microfingerprinting is to examine altered expression 
in specific families of genes through careful selection of PCR primers and /or post- 
reaction analysis. Stress genes, growth factors and/or their receptors, cell cycling 
genes, cytochromes P450 and regulatory proteins might be considered as candidates 
for analysis in this way. Indeed, some off-the-shelf DNA arrays (e.g. Clontech's 
Atlas cDNA Expression Array series) already anticipated this to some degree by 
grouping together genes involved in different responses e.g. apoptosis, stress, DNA- 
damage response etc. 



Screening 

False positives 

The generation of false positives has been discussed at length amongst the 
differential display community (Liang et al. 1993, 1995, Nishio etal. 1994, Sun etal. 
1994, Sompayrac et al. 1995). The reason for false positives varies with the 
technique being used. For instance, in RDA, the use of adaptors which have not 
been HPLC purified can lead to the production of false positives through illegitimate 
ligation events (O'Neill and Sinclair 1997), whilst in DD they can arise through 
PGR artifacts and illegitimate transcription of rRNA. In SH, false positives appear 
to be derived largely from abundant gene species, although some may arise from 
cDNA/mRNA species which do not undergo hybridization for technical reasons. 

A quick screening of putative differentially expressed clones can be carried out 
using a simple dot blot approach, in which labelled first strand probes synthesized 
from tester and driver mRNA are hybridized to an array of said clones (Hedrick et 
al. 1984, Sakaguchi et al. 1986). Differentially expressed clones will hybridize to 
tester probe, but not driver. The disadvantage of this approach is that rare species 
may not generate detectable hybridization signals. One option for those using SSH 
is to screen the clones using a labelled probe generated from the subtracted cDNA 
from which it was derived, and with a probe made from the reverse subtraction 
reaction (ClonTechniques 1997a). Since the SSH method enriches rare sequences, 
it should be possible to confirm the presence of clones representing low abundance 
genes. Despite this quick screening step, there is still the need to go back to the 
original mRNA and confirm the altered expression using a more quantitative 
approach^ Although this may be achieved using Northern blots, the sensitivity is 
poor by today 's high standards and one must rely on PCR methods for accurate and 
sensitive determinations (see below). 



Sequence analysis 

The majority of differential display procedures produce final products which are 
between 100 and lOOObp in size. However, this may considerably reduce the size of 
the sequence for analysis of the DNA databases. This in turn leads to a reduced 
confidence in the result — several families of genes have members whose DNA 
sequences are almost identical except in a few key stretches, e.g. the cytochrome 
P450 gene superfamily (Nelson et al. 1996). Thus, does the clone identified as being 
almost identical to gene X 0 really come from that gene, or its brother gene Xj or its 
as yet undiscovered sister X 2 ? For example, using SSH, part of a gene was isolated, 
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which was up-regulated in the liver of rats exposed to Wy-14,643 and was identified 
by a FASTA search as being transferrin (data not shown). However, transferrin is 
known to be downregulated by hypolipidemic peroxisome proliferators such as Wy- 
14,643 (Hertz et al. 1996), and this was confirmed with subsequent RT-PCR 
analysis. This suggests that the gene sequence isolated may belong to a gene which 
is closely related to transferrin, but is regulated by a different mechanism. 

A further problem associated with SH technology is redundancy. Inmost cases 
before SH is carried out, the cDNA population must first be simplified by restriction 
digestion. This is important for at least two reasons: 

(1) To reduce complexity — long cDNA fragments may form complex networks 
which prevent the formation of appropriate hybrids, especially at the high 
concentrations required for efficient hybridization. 

(2) Cutting the cDNAs into small fragments provides better representation of 
individual genes. This is because genes derived from related but distinct 
members of gene families often have similar coding sequences that may cross- 
hybridize and be eliminated during the subtraction procedure (Ko 1990). 
Furthermore, different fragments from the same cDNA may differ considerably 
in terms of hybridization and amplification and, thus, may not efficiently do one 
or the other (Wang and Brown 1991). Thus, some fragments from differentially 
expressed cDNAs may be eliminated during subtractive hybridization pro- 
cedures. However, other fragments may be enriched and isolated. As a 
consequence of this, some genes will be cut one or more times, giving rise to two 
or more fragments of different sizes. If those same genes are differentially 
expressed, then two or more of the different size fragments may come through 
as separate bands on the final differential display, increasing the observed 
redundancy and increasing the number of redundant sequencing reactions. 

Sequence comparisons also throw up another important point — at what degree 
of sequence similarity does one accept a result. Is 90% identitiy between a gene 
derived from your model species and another acceptably close? Is 95% between 
your sequence and one from the same species also acceptable? This problem is 
particularly relevant when the forward and reverse sequence comparisons give 
similar sequences with completely different gene species! An arbitrary decision 
seems to be to allocate genes that are definite (95% and above similarity) and then 
group those between 60 and 95% as being related or possible homologues. 

Quantitative analysis 

At some point, one must give consideration to the quantitative analysis of the 
candidate genes, either as a means of confirming that they are truly differentially 
expressed, or in order to establish just what the differences are. Northern blot 
analysis is a popular approach as it is relatively easy and quick to perform. However, 
the major drawback with Northern blots is that they are often riot sensitive enough 
to detect rare sequences. Since the majority of messages expressed in a cell are of low 
abundance (see table 1), this is a major problem. Consequently, RT-PCR may be the 
method of choice for confirming differential expression. Although the procedure is 
somewhat more complex than Northern analysis, requiring synthesis of primers and 
optimization of reaction conditions for each gene species, it is now possible to set up 
high throughput PCR systems using mulitchannel pipettes, 96 +-well plates and 
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appropriate thermal cycling technology. Whilst quantitative analysis is more 
desirable, being more accurate and without reliance on an internal standard, the 
money and time needed to develop a competitor molecule is often excessive, 
especially when one might be examining tens or even hundreds of gene species. The 
use of semi-quantitative analysis is simpler, although still relatively involved. One 
must first of all choose an internal standard that does not change in the test cells 
compared to the controls. Numerous reference genes have been tried in the past, for 
example interferon-gamma (IFN-x, Frye et al. 1989), /?-actin (Heuval et al. 1994), 
glyceraldehyde-3-phosphate dehydrogenase (GAPDH,. Wong et al. 1994), di- 
hydrofolate reductase (DHFR, Mohler and Butler 1991), ^-microglobulin (0-2- 
m, Murphy et al. 1990), hypoxanthine phosphoribbsyl transferase (HPRT, Foss et 
al. 1998) and a number of others (ClonTechniques 1997b). Ideally, an internal 
standard should not change its level of expression in the cell regardless of cell age, 
stage in the cell cycle or through the effects of external stimuli. However, it has been 
shown on numerous occasions that the levels of most housekeeping genes currently 
used by the research community do in fact change under certain conditions and in 
different tissues (ClonTechniques 1997b). It is imperative, therefore, that pre- 
liminary experiments be carried out on a panel of housekeeping genes to establish 
their suitability for use in the model system. 

Interpretation of .quantitative data must also be treated with caution. By 
comparing the lists of genes identified by differential expression one can perhaps 
gain insight into why two different species react in different ways to external stimuli. 
For example, rats and mice appear sensitive to the non-genotoxic effects of a wide 
range of peroxisome proliferators whilst Syrian hamsters and guinea pigs are largely 
resistant (Orton et al. 1984, Rodricks and Turnbull 1987, Lake et al. 1989, 1993, 
Makowska et al. 1992), A simplified approach to resolving the reason(s) why is to 
compare lists of up- and down-regulated genes in order to identify those which are 
expressed in only one species and, through background knowledge of the effects of 
the said gene, might suggest a mechanism of facilitated non-genotoxic carcinogenesis 
or protection. Of course, the situation is likely to be far more complex. Perhaps if 
there were one key gene protecting guinea pig from non-genotoxic effects and it was 
upregulated 50 times by PPs, the same gene might only be up-regulated five times 
in the rat. However, since both were noted to be upregulated, the importance of the 
gene may be overlooked. Just to complicate matters, a large change in expression 
does not necessarily mean a biologically important" change. For example, what is the 
true relevance of gene Y which shows a 50-fold increase after a particular treatment, 
and gene Z which shows only a 5-fold increase? If one examines the literature one 
may find that historically, gene Y has often been shown to be up-regulated 40-60- 
fold by a number of unrelated stimuli — in light of this the 50-fold increase would 
appear less significant. However, the literature may show that gene Z has never been 
recorded as having more than doubled in expression — which makes your 5-fold 
increase all the more exciting. Perhaps even more interesting is if -that same 5-fold 
increase has only been seen in related neoplasms or following treatment with related 
chemicals. 

Problems in using the differential display approach 

Differential display technology originally held promise of an easily obtainable 
* fingerprint * of those genes which are up- or down-regulated in test animals /cells in 
a developmental process or following exposure to given stimuli. However, it has 



Differential gene expression 



685 



become clear that the fingerprinting process, whilst still valid, is much too complex 
to be represented by a single technique profile. This is because all differential display 
techniques have common and/or unique technical problems which preclude the 
isolation and identification of all those genes which show changes in expression. 
Furthermore, there are important genetic changes related to disease development 
which differential expression analysis is simply not designed to address. An example 
of this is the presence of small deletions, insertions, or point mutations such as those 
seen in activated oncogenes, tumour suppressor genes and individual poly- 
morphisms. Polymorphic variations, small though they usually are, are often 
regarded as being of paramount importance in explaining why some patients 
respond better than others to certain drug treatments (and, in logical extension, why 
some people are less affected by potentially dangerous xenobiotics /carcinogens than 
others). The identification of such point mutations and naturally occurring 
polymorphisms requires the subsequent application of sequencing, SSCP, DGGE 
or TGGE to the gene of interest. Furthermore, differential display is not designed 
to address issues such as alternatively spliced gene species or whether an increased 
abundance of mRNA is a result of increased transcription or increased mRNA 
stability. 



Conclusions 

Perhaps the main advantage of open system differential display techniques is that 
they are not limited by extant theories or researcher bias in revealing genes which are 
differentially expressed, since they are designed to amplify all genes which 
demonstrate altered expression. This means that they are useful for the isolation of 
previously unknown genes which may turn out be useful biomarkers of a particular 
state or condition. At least one open system (SAGE) is also quantitative, thus 
eliminating the need to return to the original mRNA and carry out Northern /PCR 
analysis to confirm the result. However^ the rapid progress of genome mapping 
projects means that over the next 5-10 years or so, the balance of experimental use 
will switch from open to closed differential display systems, particularly DNA 
arrays. Arrays are easier and faster to prepare and use, provide quantitative data, are 
suitable for high throughput analysis and can be tailored to look at specific signalling 
pathways or families of genes. Identification of all the gene sequences in human and 
common laboratory animals combined with improved DNA array technology, 
means that it will soon no longer be necessary to try to isolate differentially expressed 
genes using the technically more demanding open system approach. Thus, their 
main advantage (that of identifying unknown genes) will be largely eradicated. It is 
likely, therefore, that their sphere of application will be reduced to analysis of the 
less common laboratory species, since it will be some time yet before the genomes of 
such animals as zebrafish, electric eels, gerbils, crayfish and squid, for example, will 
be sequenced. 

Of course, in the end the question will always remain: What is the functional/ 
biological significance of the identified, differentially expressed genes ? One 
persistent problem is understanding whether differentially expressed genes are a 
cause or consequence of the altered state. Furthermore, many chemicals, such as 
non-genotoxic carcinogens, are also mitogens and so genes associated with 
replication will also be upregulated but may have little or nothing to do with the 
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carcinogenic effect. Whilst differential display technology cannot hope to answer 
these questions, it does provide a springboard from which identification, regulatory 
and functional studies can be launched. Understanding the molecular mechanism of 
cellular responses is almost impossible without knowing the regulation and function 
of those genes and their condition (e.g. mutated). In an abstract sense, differential 
display can be likened to a still photograph, showing details of a fixed moment in 
time. Consider the Historian who knows the outcome of a battle and the placement 
and condition of the troops before the battle commenced, but is asked to try and 
deduce how the battle progressed and why it ended as it did from a few still 
photographs — an impossible task. In order to understand the battle, the Historian 
must find out the capabilities and motivation of the soldiers and their commanding 
officers, what the orders were and whether they were obeyed. He must examine the 
terrain, the remains of the battle and consider the effects the prevailing weather 
conditions exerted. Likewise, if mechanistic answers are to be forthcoming, the 
scientist must use differential display in combination with other techniques, such as 
knockout technology, the analysis of cell signalling pathways, mutation analysis and 
time and dose response analyses. Although this review has emphasized the 
importance of differential gene profiling, it should not be considered in isolation and 
the full impact of this approach will be strengthened if used in combination with 
functional genomics and proteomics (2-dimensional protein gels from isoelectric 
focusing and subsequent SDS electrophoresis and virtual 2D-maps using capillary 
electrophoresis). Proteomics is attracting much recent attention as many of the 
changes resulting in differential gene expression do not involve changes in mRNA 
levels, as decribed extensively herein, but rather protein-protein, protein-DN A and 
protein phosphorylation events which would require functional genomics or 
proteomic technologies for investigation. 

Despite the limitations of differential display technology, it is clear that many 
potential applications and benefits can be obtained from characterizing the genetic 
changes that occur in a cell during normal and disease development and in response 
to chemical or biological insult. In light of functional data, such profiling will 
provide a * fingerprint* of each stage of development or response, and in the long 
term should help in the elucidation of specific and sensitive biomarkers for different 
types of chemical/biological exposure and disease states. The potential medical and 
therapeutic benefits of understanding such molecular changes are almost im- 
measurable. Amongst other things, such fingerprints could indicate the family or 
even specific type of chemical an individual has been exposed to plus the length 
and/or acuteness of that exposure, thus indicating the most prudent treatment. 
They may also help uncover differences in histologically identical cancers, provide 
diagnostic tests for the earliest stages of neoplasia and, again, perhaps indicate the 
most efficacious treatment. 

The Human Genome Project will be completed early in the next century and the 
DNA sequence of all the human genes will be known. The continuing development 
and evolution of differential gene expression technology will ensure that this 
knowledge contributes fully to the understanding of human disease processes. 
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ABSTRACT The recent ability to sequence whole genomes 
allows ready access to all genetic material. The approaches 
outlined here allow automated analysis of sequence for the 
synthesis of optimal primers in an automated multiplex 
oligonucleotide synthesizer (AMOS). The efficiency is such 
that all ORFs for an organism can be amplified by PCR. The 
resulting amplicons can be used directly in the construction of 
DNA arrays or can be cloned for a large variety of functional 
analyses. These tools allow a replacement of single-gene 
analysis with a highly efficient whole-genome analysis. 



The genome sequencing projects have generated and will 
continue to generate enormous amounts of sequence data. The 
genomes of Saccharomyces cerevisiae, Escherichia coli, Hae- 
mophilus influenzae (1), Mycoplasma genitalium (2), and Meth- 
anococcus jannaschii (3) have been completely sequenced. 
Other model organisms have had substantial portions of their 
genomes sequenced as well, including the nematode Caeno- 
rhabditis elegans (4) and the small flowering plant Arabidopsis 
thaliana (5). This massive and increasing amount of sequence 
information allows the development of novel experimental 
approaches to identify gene function. 

One standard use of genome sequence data is to attempt to 
identify the functions of predicted open reading frames 
(ORFs) within the genome by comparison to genes of known 
function. Such a comparative analysis of all ORFs to existing 
sequence data is fast, simple, and requires no experimentation 
and is therefore a reasonable first step. While finding sequence 
homologies/motifs is not a substitute for experimentation, 
noting the presence of sequence homology and/or sequence 
motifs can be a useful first step in finding interesting genes, in 
designing experiments and, in some cases, predicting function. 
However, this type of analysis is frequently uninformative. For 
example, over one-half of new ORFs in S. cerevisiae have no 
known function (6). If this is the case in a well studied organism 
such as yeast, the problem will be even worse in organisms that 
are less well studied or less manipulable. A large, experimen- 
tally determined gene function database would make homol- 
ogy/motif searches much more useful. 

Experimental analysis must be performed to thoroughly 
understand the biological function of a gene product. Scaling 
up from classical "cottage industry" one-gene-oriented ap- 
proaches to whole-genome analysis would be very expensive 
and laborious. It is clear that novel strategies are necessary to 
efficiently pursue the next phase of the genome projects — 
whole-genome experimental analysis to explore gene expres- 
sion, gene product function, and other genome functions. 
Model organisms, such as 5. cerevisiae, will be extremely 
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important in the development of novel whole-genome analysis 
techniques and, subsequently, in improving our understanding 
of other more complex and less manipulable organisms. 

The genome sequence can be systematically used as a tool 
to understand ORFs, gene product function, and other ge- 
nome regions. Toward this end, a directed strategy has been 
developed for exploiting sequence information as a means of 
providing information about biological function (Fig. 1). Ef- 
forts have been directed toward the amplification of each 
predicted ORF or any other region of the genome ranging 
from a few base pairs to several kilobase pairs. There are many 
uses for these amplicons — they can be cloned into standard 
vectors or specialized expression vectors, or can be cloned into 
other specialized vectors such as those used for two-hybrid 
analysis. The amplicons can also be used directly by, for 
example, arraying onto glass for expression analysis, for DNA 
binding assays, or for any direct DNA assay (7). As a pilot 
study, synthetic primers were made on the 96-well automated 
multiplex oligonucleotide synthesizer (AMOS) instrument (8) 
(Fig. 2). These oligonucleotides were used to amplify each 
ORF on yeast chromosome V. The current version of this 
instrument can synthesize three plates of 96 oligonucleotides 
each (25 bases) in an 8-hr day. The amplification of the entire 
set of PCR products was then analyzed by gel electrophoresis 
(Fig. 3). Successful amplification of the proper length product 
on the first attempt was 95%. This project demonstrates that 
one can go directly from sequence information to biological 
analysis in a truly automated, totally directed manner. 

These amplicons can be incorporated directly in arrays or 
the amplicons can be cloned. If the amplicons are to be cloned, 
novel sequences can be incorporated at the 5' end of the 
oligonucleotide to facilitate cloning. One potential problem 
with cloning PCR products is that the cloned amplicons may 
contain sequence alterations that diminish their utility. One 
option would be to resequence each individual amplicon. 
However, this is expensive, inefficient, and time consuming. A 
faster, more cost-effective, and more accurate approach is to 
apply comparative sequencing by denaturing HPLC (9). This 
method is capable of detecting a single base change in a 2-kb 
heteroduplex. Longer amplicons can be analyzed by use of 
appropriate restriction fragments. If any change is detected in 
a clone, an alternate clone of the same region can be analyzed. 
Modifying the system to allow high throughput analysis by 
denaturing HPLC is also relatively simple and straightforward. 

If amplicons are used directly on arrays without cloning, it 
is important to note that, even if single PCR product bands are 
observed on gels, the PCR products will be contaminated with 
various amounts of other sequences. This contamination has 
the potential to affect the results in, for example, expression 
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Fig. 1. Overview of systematic method for isolating individual 
genes. Sequence information is obtained automatically from sequence 
databases. The data are input into primer selection software specifi- 
cally designed to target ORFs as designated by database annotations. 
The output file containing the primer information is directly read by 
a high-throughput oligonucleotide synthesizer, which makes the oli- 
gonucleotides in 96-well plates (AMOS, automated multiplex oligo- 
nucleotide synthesizer). The forward and reverse primers are synthe- 
sized in the same location on separate plates to facilitate the down- 
stream handling of primers. The amplicons are generated by PCR in 
96-well plates as well. 

analysis. On the other hand, direct use of the amplicons is 
much less labor intensive and greatly decreases the occurrence 
of mistakes in clone identification, a ubiquitous problem 
associated with large clone set archiving and retrieving. 

Any large-scale effort to capture each ORF within a genome 
must rely on automation if cost is to be minimized while 
efficiency is maximized. Toward that end, primers targeting 
ORFs were designed automatically using simple new scripts 
and existing primer selection software. These script-selected 
primer sequences were directly read by the high-throughput 
synthesizer and the forward and reverse primers were synthe- 
sized in separate plates in corresponding wells to facilitate 
automated pipetting and PCR amplifications. Each of the 
resulting PCR products, generated with minimum labor, con- 
tains a known, unique ORF. 

Large-scale genome analysis projects are dependent on 
newly emerging technologies to make the studies practical and 
economically feasible. For example, the cost of the primers, a 
significant issue in the past, has been reduced dramatically to 
make feasible this and other projects that require tens of 
thousands of oligonucleotides. Other methods of high- 
throughput analysis are also vital to the success of functional 
analysis projects, such as microarraying and oligonucleotide 
chip methods (10-14). 

Changes in attitude are also required. One of the major costs 
of commercial oligonucleotides is extensive quality control 
such that virtually 100% of the supplied oligonucleotides are 
successfully synthesized and work for their intended purpose. 
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Fig. 2. Overall approach for using database of a genome to direct 
biological analysis. The synthesis of the 6,000 ORFs (orfs) for each 
gene of 5. cerevisiae can be used in many applications utilizing both 
cloning and microarraying technology. 

Considerable cost reduction can be obtained by simply de- 
creasing the expected successful synthesis rate to 95-97%. One 
can then achieve faster and cheaper whole genome coverage by 
simply adding a single quality control at the end of the 
experiment and batching the failures for resynthesis. 

The directed nature of the amplicon approach is of clear 
advantage. The sequence of each ORF is analyzed automati- 
cally, and unique specific primers are made to target each 
ORF. Thus, there is relatively little time or labor involved — for 
example, no random cloning and subsequent screening is 
required because each product is known. In the test system, 
primers for 240 ORFs from chromosome V were systematically 
synthesized, beginning from the left arm and continuing 
through to the right arm. At no point was there any manual 
analysis of sequence information to generate the collection. In 
many ways, now that the sequence is known, there is no need 
for the researcher to examine it. 

These amplicons can be arrayed and expression analysis can 
be done on all arrayed ORFs with a single hybridization (10). 
Those ORFs that display significant differential expression 
patterns under a given selection are easily identified without 
the laborious task of searching for and then sequencing a clone. 
Once scaled up, the procedure provides even greater returns 
on effort, because a single hybridization will ultimately provide 
a "snapshot" of the expression of all genes in the yeast genome. 
Thus, the limiting factor in whole genome analysis will not be 
the analysis process itself, but will instead be the ability of 
researchers to design and carry out experimental selections. 

Current expression and genetic analysis technologies are 
geared toward the analysis of single genes and are ill suited to 
analyze numerous genes under many conditions. Additional 
difficulties with current technologies include: the effort and 
expense required to analyze expression and make mutants, the 
potential duplication of effort if done by different laboratories, 
and the possibility of conflicting results obtained from differ- 
ent laboratories. In contrast, whole genome analysis not only 
is more efficient, it also provides data of much higher quality; 
all genes are assayed and compared in parallel under exactly 
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Fig. 3. Gel image of amplifications. Using the method described in 
V. One plate of 96 amplification reactions is shown. 

the same conditions. In addition, amplicons have many appli- 
cations beyond gene expression. For example, one recent 
approach is to incorporate a unique DNA sequence tag, 
synthesized as part of each gene specific primer, during 
amplification. The tags or molecular bar codes, when reintro- 
duced into the organism as a gene deletion or as a gene clone, 
can be used much more efficiently than individual mutations 
or clones because pools of tagged mutants or transformants 
can be analyzed in parallel. This parallel analysis is possible 
because the tags are readily and quantitatively amplified even 
in complex mixtures of tags (13). 

These ORF genome arrays and oligonucleotide tagged 
libraries can be used for many applications. Any conventional 
selection applied to a library that gives discrete or multiple 
products can use these technologies for a simple direct read- 
out. These include screens and selections for mutant comple- 
mentation, overexpression suppression (15, 16), second-site 
suppressors, synthetic lethality, drug target overexpression 
(17), two-hybrid screens (18), genome mismatch scanning (19), 
or recombination mapping. 

The genome projects have provided researchers with a vast 
amount of information. These data must be used efficiently 
and systematically to gain a truly comprehensive understand- 
ing of gene function and, more broadly, of the entire genome 
which can then be applied to other organisms. Such global 
approaches are essential if we are to gain an understanding of 
the living cell. This understanding should come from the 
viewpoint of the integration of complex regulatory networks, 
the individual roles and interactions of thousands of functional 
gene products, and the effect of environmental changes on 
both gene regulatory networks and the roles of all gene 
products. The time has come to switch from the analysis of a 
single gene to the analysis of the whole genome. 

Support was provided by National Institutes of Health Grants 
R37H60198 and P01H600205. 
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The availability of genome-scale DNA sequence information and reagents has radically altered life-science 
research. This revolution has led to the development of a new scientific subdiscipline derived from a combina- 
tion of the fields of toxicology and genomics. This subdiscipline, termed toxicogenomics, is concerned with the 
identification of potential human and environmental toxicants, and their putative mechanisms of action, through 
the use of genomics resources. One such resource is DNA microarrays or "chips," which allow the monitoring of 
the expression levels of thousands of genes simultaneously. Here we propose a general method by which gene 
expression, as measured by cDNA microarrays, can be used as a highly sensitive and informative marker for 
toxicity. Our purpose is to acquaint the reader with the development and current state of microarray technol- 
ogy and to present our view of the usefulness of microarrays to the field of toxicology. Mol. Carcinog. 24:153- 
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INTRODUCTION 

Technological advancements combined with in- 
tensive DNA sequencing efforts have generated an 
enormous database of sequence information over the 
past decade. To date, more than 3 million sequences, 
totaling over 2.2 billion bases [1], are contained 
within the GenBank database, which includes the 
complete sequences of 19 different organisms [2]. The 
first complete sequence of a free-living organism, 
Haemophilus influenzae, was reported in 1995 [3] and 
was followed shortly thereafter by the first complete 
sequence of a eukaryote, Saccharomyces cervisiae [4], 
The development of dramatically improved sequenc- 
ing methodologies promises that complete elucida- 
tion of the Homo sapiens DNA sequence is not far 
behind [5]. 

To exploit more fully the wealth of new sequence 
information, it was necessary to develop novel meth- 
ods for the high-throughput or parallel monitoring 
of gene expression. Established methods such as 
northern blotting, RNAse protection assays, SI nu- 
clease analysis, plaque hybridization, and slot blots 
do not provide sufficient throughput to effectively 
utilize the new genomics resources. Newer methods 
such as differential display [6], high-density filter 
hybridization [7,8], serial analysis of gene expression 
[9], and cDNA- and oligonucleotide-based microarray 
"chip" hybridization [10-12] are possible solutions 
to this bottleneck. It is our belief that the microarray 
approach, which allows the monitoring of expres- 
sion levels of thousands of genes simultaneously, is 
a tool of unprecedented power for use in toxicology 
studies. 



Almost without exception, gene expression is al- 
tered during toxicity, as either a direct or indirect 
result of toxicant exposure. The challenge facing 
toxicologists is to define, under a given set of ex- 
perimental conditions, the characteristic and spe- 
cific pattern of gene expression elicited by a given 
toxicant. Microarray technology offers an ideal plat- 
form for this type of analysis and could be the foun- 
dation for a fundamentally new approach to 
toxicology testing. 

MICROARRAY DEVELOPMENT AND APPLICATIONS 

cDNA Microarrays 

In the past several years, numerous systems were 
developed for the construction of large-scale DNA 
arrays. All of these platforms are based on cDNAs 
or oligonucleotides immobilized to a solid sup- 
port. In the cDNA approach, cDNA (or genomic) 
clones of interest are arrayed in a multi-well for- 
mat and amplified by polymerase chain reaction. 
The products of this amplification, which are usu- 
ally 500- to 2000-bp clones from the 3' regions of 
the genes of interest, are then spotted onto solid 
support by using high-speed robotics. By using 
this method, microarrays of up to 10 000 clones 
can be generated by spotting onto a glass substrate 
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[13,14]. Sample detection for microarrays on glass 
involves the use of probes labeled with fluores- 
cent or radioactive nucleotides. 

Fluorescent cDNA probes are generated from con- 
trol and test RNA samples in single-round reverse-tran- 
scription reactions in the presence of fluorescently 
tagged dUTP (e.g., Cy3-dUTP and Cy5-dUTP), which 
produces control and test products labeled with dif- 
ferent fluors. The cDNAs generated from these two 
populations, collectively termed the "probe," are then 
mixed and hybridized to the array under a glass cov- 
erslip [10,11,15]. The fluorescent signal is detected 
by using a custom-designed scanning confocal mi- 
croscope equipped with a motorized stage and lasers 
for fluor excitation [10,11,15]. The data are analyzed 
with custom digital image analysis software that de- 
termines for each DNA feature the ratio of fluor 1 to 
fluor 2, corrected for local background [16,17]. The 
strength of this approach lies in the ability to label 
RNAs from control and treated samples with differ- 
ent fluorescent nucleotides, allowing for the simul- 
taneous hybridization and detection of both 
populations on one microarray. This method elimi- 
nates the need to control for hybridization between 
arrays. The research groups of Drs. Patrick Brown and 
Ron Davis at Stanford University spearheaded the 
effort to develop this approach, which has been suc- 
cessfully applied to studies of Arabidopsis thaliana 
RNA [10], yeast genomic DNA [15], tumorigenic ver- 
sus non-tumorigenic human tumor cell lines [11], 
human T-cells [18], yeast RNA [19], and human in- 
flammatory disease-related genes [20]. The most dra- 
matic result of this effort was the first published 
account of gene expression of an entire genome, that 
of the yeast Saccharomyces cervisiae [21]. 

In an alternative approach, large numbers of cDNA 
clones can be spotted onto a membrane support, al- 
beit at a lower density [7,22]. This method is useful 
for expression profiling and large-scale screening and 
mapping of genomic or cDNA clones [7,22-24]. In 
expression profiling on filter membranes, two dif- 
ferent membranes are used simultaneously for con- 
trol and test RNA hybridizations, or a single 
membrane is stripped and reprobed. The signal is 
detected by using radioactive nucleotides and visu- 
alized by phosphorimager analysis or autoradiogra- 
phy. Numerous companies now sell such cDNA 
membranes and software to analyze the image data 
[25-27]. 

Oligonucleotide Microarrays 

Oligonucleotide microarrays are constructed either 
by spotting prefabricated oligos on a glass support 
[13] or by the more elegant method of direct in situ 
oligo synthesis on the glass surface by photolithog- 
raphy [28-30]. The strength of this approach lies in 
its ability to discriminate DNA molecules based on 
single base-pair difference. This allows the applica- 
tion of this method to the fields of medical diagnos- 



tics, pharmacogenetics, and sequencing by hybrid- 
ization as well as gene-expression analysis. 

Fabrication of oligonucleotide chips by photoli- 
thography is theoretically simple but technically 
complex [29,30]. The light from a high-intensity 
mercury lamp is directed through a photolitho- 
graphic mask onto the silica surface, resulting in 
deprotection of the terminal nucleotides in the illu- 
minated regions. The entire chip is then reacted with 
the desired free nucleotide, resulting in selected chain 
elongation. This process requires only 4n cycles 
(where n = oligonucleotide length in bases) to syn- 
thesize a vast number of unique oligos, the total num- 
ber of which is limited only by the complexity of the 
photolithographic mask and the chip size [29,31,32]. 

Sample preparation involves the generation of 
double-stranded cDNA from cellular poly(A)+ RNA 
followed by antisense RNA synthesis in an in vitro 
transcription reaction with biotinylated or fluor- 
tagged nucleotides. The. RNA probe is then frag- 
mented to facilitate hybridization. If the indirect 
visualization method is used, the chips are incubated 
with fluor-linked streptavidin (e.g., phycoerythrin) 
after hybridization [12,33]. The signal is detected with 
a custom confocal scanner [34]. This method has 
been applied successfully to the mapping of genomic 
library clones [35], to de novo sequencing by hybrid- 
ization [28,36], and to evolutionary sequence com- 
parison of the BRCA1 gene [37]. In addition, 
mutations in the cystic fibrosis [38] and BRCA1 [39] 
gene products and polymorphisms in the human im- 
munodeficiency virus- 1 clade B protease gene [40] 
have been detected by this method. Oligonucleotide 
chips are also useful for expression monitoring [33] 
as has been demonstrated by the simultaneous evalu- 
ation of gene-expression patterns in nearly all open 
reading frames of the yeast strain 5. cerevisiae [12]. 
More recently, oligonucleotide chips have been used 
to help identify single nucleotide polymorphisms in 
the human [41] and yeast [42] genomes. 

THE USE OF MICROARRAYS IN TOXICOLOGY 

Screening for Mechanism of Action 

The field of toxicology uses numerous in vivo 
model systems, including the rat, mouse, and rab- 
bit, to assess potential toxicity and these bioassays 
are the mainstay of toxicology testing. However, in 
the past several decades, a plethora of in vitro tech- 
niques have been developed to measure toxicity, 
many of which measure toxicant-induced DNA dam- 
age. Examples of these assays include the Ames test, 
the Syrian hamster embryo cell transformation as- 
say, micronucleus assays, measurements of sister 
chromatid exchange and unscheduled DNA synthe- 
sis, and many others. Fundamental to all of these 
methods is the fact that toxicity is often preceded 
by, and results in, alterations in gene expression. In 
many cases, these changes in gene expression are a 
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far more sensitive; characteristic, and measurable 
endpoint than the toxicity itself. We therefore pro- 
pose that a method based on measurements of the 
genome-wide gene expression pattern of an organ- 
ism after toxicant exposure is fundamentally infor- 
mative and complements the established methods 
described above. 

We are developing a method by which toxicants 
can be identified and their putative mechanisms of 
action determined by using toxicant-induced gene ex- 
pression profiles. In this method, in one or more de- 
fined model systems, dose and time-course parameters 
are established for a series of toxicants within a given 
prototypic class (e.g., polycyclic aromatic hydrocar- 
bons (PAHs)). Cells are then treated with these agents 
at a fixed toxicity level (as measured by cell survival), 
RNA is harvested, and toxicant-induced gene expres- 
sion changes are assessed by hybridization to a cDNA 
microarray chip (Figure 1). We have developed a cus- 
tom DNA chip, called ToxChip vl.O, specifically for 
this purpose and will discuss it in more detail below. 
The changes in gene expression induced by the test 
agents in the model systems are analyzed, and the 
common set of changes unique to that class of toxi- 
cants, termed a toxicant signature, is determined. 

This signature is derived by ranking across all ex- 
periments the gene-expression data based on rela- 

Control 
Population 



tive fold induction or suppression of genes in treated 
samples versus untreated controls and selecting the 
most consistently different signals across the sample 
set. A different signature may be established for each 
prototypic toxicant class. Once the signatures are de- 
termined, gene-expression profiles induced by un- 
known agents in these same model systems can then 
be compared with the established signatures. A match 
assigns a putative mechanism of action to the test 
compound. Figure 2 illustrates this signature method 
for different types of oxidant stressors, PAHs, and 
peroxisome proliferators. In this example, the un- 
known compound in question had a gene-expres- 
sion profile similar to that of the oxidant stressors in 
the database. We anticipate that this general method 
will also reveal cross talk between different pathways 
induced by a single agent (e.g., reveal that a com- 
pound has both PAH-like and oxidant-like proper- 
ties). In the future, it may be necessary to distinguish 
very subtle differences between compounds within 
a very large sample set (e.g., thousands of highly simi- 
lar structural isomers in a combinatorial chemistry 
library or peptide library). To generate these highly 
refined signatures, standard statistical clustering tech- 
niques or principal-component analysis can be used. 

For the studies outlined in Figure 2, we developed 
the custom cDNA microarray chip ToxChip vl.O. 
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Figure 1 . Simplified overview of the method for sample trative purposes, samples derived from cell culture are depicted, 
preparation and hybridization to cDNA microarrays. For illus- although other sample types are amenable to this analysis. 
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Figure 2. Schematic representation of the method for iden- 
tification of a toxicant's mechanism of action. In this method, 
gene-expression data derived from exposure of model sys- 
tems to known toxicants are analyzed, and a set of changes 
characteristic to that type of toxicant (termed the toxicant 
signature) is identified- As depicted, oxidant stressors produce 



consistent changes in group A genes (indicated by red and 
green circles), but not group B or C genes (indicated by gray 
circles). The set of gene-expression changes elicited by the 
suspected toxicant is then compared with these characteristic 
patterns, and a putative mechanism of action is assigned to 
the unknown agent. 



The 2090 human genes that comprise this subarray 
were selected for their well-documented involve- 
ment in basic cellular processes as well as their re- 
sponses to different types of toxic insult. Included 
on this list are DNA replication and repair genes, 
apoptosis genes, and genes responsive to PAHs and 
dioxin-like compounds, peroxisome proliferators, 
estrogenic compounds, and oxidant stress. Some of 
the other categories of genes include transcription 
factors, oncogenes, tumor suppressor genes, cyclins, 
kinases, phosphatases, cell adhesion and motility 
genes, and homeobox genes. Also included in this 
group are 84 housekeeping genes, whose hybridiza- 
tion intensity is averaged and used for signal nor- 
malization of the other genes on the chip. To date, 
very few toxicants have been shown to have appre- 
ciable effects on the expression of these housekeep- 
ing genes. However, this housekeeping list will be 
revised if new data warrant the addition or deletion 
of a particular gene. Table 1 contains a general de- 
scription of some of the different classes of genes 
that comprise ToxChip vl.O. 

When a toxicant signature is determined, the 
genes within this signature are flagged within the 
database. When uncharacterized toxicants are then 
screened, the data can be quickly reformatted so that 
blocks of genes representing the different signatures 



are displayed [11]. This facilitates rapid, visual in- 
terpretation of data. We are also developing Tox- 
Chip v2.0 and chips for other model systems, 
including rat, mouse, Xenopus, and yeast, for use in 
toxicology studies. 

Animal Models in Toxicology Testing 

The toxicology community relies heavily on the 
use of animals as model systems for toxicology test- 
ing. Unfortunately, these assays are inherently ex- 
pensive, require large numbers of animals and take a 
long time to complete and analyze. Therefore, the 
National Institute of Environmental Health Sciences 
(NIEHS), the National Toxicology Program, and the 
toxicology community at large are committed to re- 
ducing the number of animals used, by developing 
more efficient and alternative testing methodologies. 
Although substantial progress has been made in the 
development of alternative methods, bioassays are 
still used for testing endpoints such as neurotoxic- 
ity, immunotoxicity, reproductive and developmen- 
tal toxicology, and genetic toxicology. The rodent 
cancer bioassay is a particularly expensive and time- 
consuming assay, as it requires almost 4 yr, 1200 
animals, and millions of dollars to execute and ana- 
lyze [43]. In vitro experiments of the type outlined 
in Figure 2 might provide evidence that an unknown 
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Table 1. ToxChip v1.0: A Human cDNA Microarray 
Chip Designed to Detect Responses to Toxic Insult 

No. of genes 



Gene category . on chip 



Apoptosis 72 

DNA replication and repair 99 

Oxidative stress/redox homeostasis 90 

Peroxisome proliferator responsive 22 

Dioxin/PAH responsive 1 2 

Estrogen responsive 63 

Housekeeping 84 

Oncogenes and tumor suppressor genes 76 

Cell-cycle control 51 

Transcription factors 131 

Kinases 276 

Phosphatases 88 

Heat-shock proteins 23 

Receptors 349 

Cytochrome P450s 30 



*This list is intended as a general guide. The gene categories are not 
unique, and some genes are listed in multiple categories. 

agent is (or is not) responsible for eliciting a given 
biological response. This information would help to 
select a bioassay more specifically suited to the agent 
in question or perhaps suggest that a bioassay is not 
necessary, which would dramatically reduce cost, 
animal use, and time. 

The addition of microarray techniques to stan- 
dard bioassays may dramatically enhance the sen- 
sitivity and interpretability of the bioassay and 
possibly reduce its cost. Gene-expression signatures 
could be determined for various types of tissue-spe- 
cific toxicants, and new compounds could be 
screened for these characteristic signatures, provid- 
ing a rapid and sensitive in vivo test. Also, because 
gene expression is often exquisitely sensitive to low 
doses of a toxicant, the combination of gene-expres- 
sion screening and the bioassay might allow the use 
of lower toxicant doses, which are more relevant to 
human exposure levels, and the use of fewer ani- 
mals. In addition, gene-expression changes are nor- 
mally measured in hours or days, not in the months 
to years required for tumor development. Further- 
more, microarrays might be particularly useful for 
investigating the relationship between acute and 
chronic toxicity and identifying secondary effects 
of a given toxicant by studying the relationship 
between the duration of exposure to a toxicant and 
the gene-expression profile produced. Thus, a bio- 
assay that incorporates gene-expression signatures 
with traditional endpoints might be substantially 
shorter, use more realistic dose regimens, and cost 
substantially less than the current assays do. 

These considerations are also relevant for branches 
of toxicology not related to human health and not 
using rodents as model systems, such as aquatic toxi- 
cology and plant pathology. Bioassays based on the 
flathead minnow, Daphnia, and Arabadopsis could 



also be improved by the addition of microarray analy- 
sis. The combination of microarrays with traditional 
bioassays might also be useful for investigating some 
of the more intractable problems in toxicology re- 
search, such as the effects of complex mixtures and 
the difficulties in cross-species extrapolation. 

Exposure Assessment, Environmental Monitoring, 
and Drug Safety 

The currently used methods for assessment of ex- 
posure to chemical toxicants are based on measure- 
ment of tissue toxin levels or on surrogate markers 
of toxicity, termed biomarkers (e.g., peripheral blood 
levels of hepatic enzymes or DNA adducts). Because 
gene expression is a sensitive endpoint, gene expres- 
sion as measured with microarray technology may 
be useful as a new biomarker to more precisely iden- 
tify hazards and to assess exposure. Similarly, 
microarrays could be used in an environmental- 
monitoring capacity to measure the effect of poten- 
tial contaminants on the gene-expression profiles 
of resident organisms. In an analogous fashion, 
microarrays could be used to measure gene-expres- 
sion endpoints in subjects in clinical trials. The com- 
bination of these gene-expression data and more 
established toxic endpoints in these trials could be 
used to define highly precise surrogates of safety. 

Gene-expression profiles in samples from exposed 
individuals could be compared to the profiles of the 
same individuals before exposure. From this infor- 
mation, the nature of the toxic exposure can be de- 
termined or a relative clinical safety factor estimated. 
In the future it may also be possible to estimate not 
only the nature but the dose of the toxicant for a 
given exposure, based on relative gene-expression 
levels. This general approach may be particularly 
appropriate for occupational-health applications, in 
which unexposed and exposed samples from the 
same individuals may be obtainable. For example, 
a pilot study of gene expression in peripheral-blood 
lymphocytes of Polish coke-oven workers exposed 
to PAHs (and many other compounds) is under con- 
sideration at the NIEHS. An important consideration 
for these types of studies is that gene expression can 
be affected by numerous factors, including diet, 
health, and personal habits. To reduce the effects 
of these confounding factors, it may be necessary 
to compare pools of control samples with pools of 
treated samples. In the future it may be possible to 
compare exposed sample sets to a national database 
of human-expression data, thus eliminating the 
need to provide an unexposed sample from the same 
individual. Efforts to develop such a national gene- 
expression database are currently under way [44,45]. 
However, this national database approach will re- 
quire a better understanding of genome-wide gene 
expression across the highly diverse human popu- 
lation and of the effects of environmental factors 
on this expression. 
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Alleles, Oligo Arrays, and Toxicogenetics 

Gene sequences vary between individuals; and 
this variability can be a causative factor in human 
diseases of environmental origin [46,47]. A new area 
of toxicology, termed toxicogenetics, was recently 
developed to study the relationship between genetic 
variability and toxicant susceptibility. This field is 
not the subject of this discussion, but it is worth- 
while to note that the ability of oligonucleotide ar- 
rays to discriminate DNA molecules based on single 
base-pair differences makes these arrays uniquely 
useful for this type of analysis. Recent reports dem- 
onstrated the feasibility of this approach [41,42]. 
The NIEHS has initiated the Environmental Genome 
Project to identify common sequence polymor- 
phisms in 200 genes thought to be involved in en- 
vironmental diseases [48]. In a pilot study on the 
feasibility of this application to the Environmental 
Genome Project, oligonucleotide arrays will be used 
to resequence 20 candidate genes. This toxicogenetic 
approach promises to dramatically improve our un- 
derstanding of interindividual variability in disease 
susceptibility. 

FUTURE PRIORITIES 

There are many issues that must be addressed be- 
fore the full potential of microarrays in toxicology 
research can be realized. Among these are model sys- 
tem selection, dose selection, and the temporal na- 
ture of gene expression. In other words, in which 
species, at what dose, and at what time do we look 
for toxicant-induced gene expression? If human 
samples are analyzed, how variable is global gene 
expression between individuals, before and after toxi- 
cant exposure? What are the effects of age, diet, and 
other factors on this expression? Experience, in the 
form of large data sets of toxicant exposures, will 
answer these questions. 

One of the most pressing issues for array scientists 
is the construction of a national public database 
(linked to the existing public databases) to serve as a 
repository for gene-expression data. This relational 
database must be made available for public use, and 
researchers must be encouraged to submit their ex- 
pression data so that others may view and query the 
information. Researchers at the National Institutes 
of Health have made laudable progress in develop- 
ing the first generation of such a database [44,45]. In 
addition, improved statistical methods for gene clus- 
tering and pattern recognition are needed to ana- 
lyze the data in such a public database. 

The proliferation of different platforms and meth- 
ods for microarray hybridizations will improve 
sample handling and data collection and analysis and 
reduce costs. However, the variety of microarray 
methods available will create problems of data com- 
patibility between platforms. In addition, the near- 
infinite variety of experimental conditions under 



which data will be collected by different laborato- 
ries will make large-scale data analysis extremely dif- 
ficult. To help circumvent these future problems, a 
set of standards to be included on all platforms 
should be established. These standards would facili- 
tate data entry into the national database and serve 
as reference points for cross-platform and inter-labo- 
ratory data analysis. 

Many issues remain to be resolved, but it is clear 
that new molecular techniques such as microarray 
hybridization will have a dramatic impact on toxicol- 
ogy research. In the future, the information gathered 
from microarray-based hybridization experiments will 
form the basis for an improved method to assess the 
impact of chemicals on human and environmental 
health. 
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1. Introduction 

The majority of drugs act by binding to protein 
targets, most to known proteins representing en- 
zymes, receptors and channels, resulting in effects 
such as enzyme inhibition and impairment of 
signal transduction. The treatment-induced per- 
turbations provoke feedback reactions aimina to 
compensate for the stimulus, which almost always 
are associated with signals to the nucleus, result- 
ing in altered gene expression. Such gene expres- 
sion regulations account for both the 
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pharmacological action and the toxicity of a drug 
and can be visualized by either global mRNA or 
global protein expression profiling. Hence, for 
each individual drug, a characteristic gene regula- 
tion pattern, its molecular fingerprint, exists 
which bears valuable information on its mode of 
action and its mechanism of toxicity. 

Gene expression is a multistep process that 
results in an active protein (Fig. 1). There exist 
numerous regulation systems that exert control at 
and after the transcription and the translation 
step. Genomics, by definition, encompasses the 
quantitative analysis of transcripts at the mRNA 
level, while the aim of proteomics is to quantify 
gene expression further down-stream, creating a 
snapshot of gene regulation closer to ultimate cell 
function control. 
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3. Global protein profiling 



2. Global mRNA profiling 

Expression data at the mRNA level can be 
produced using a set of different technologies 
such as DNA microarrays, reverse transcript 
imaging, amplified fragment length polymorphism 

K analysis of &™ expression 

(SAGE) and others. Currently, DNA microarrays 
are very popular and promise a great potential 
On a typical array, each gene of interest is repre- 
sented either by a long DNA fragment (200-2400 
bp) typically generated by polymerase chain reac- 
tion (PCR) and spotted on a suitable substrate 
using robotics (Schena et al., 1995; Shalon et al 
1996) or by several short oligonucleotides (20-30 
bp) synthesized directly onto a solid support usin* 
photolabile nucleotide chemistry (Fodor et al* 
1991; Chee et al., 1996). From control and treated 
tissues, total RNA or mRNA is isolated and 
reverse transcribed in the presence of radioactive 
or fluorescent labeled nucleotides, and the labeled 
probes are then hybridized to the arrays The 
intensity of the array signal is measured for each 
gene transcript by either autoradiography or laser 
scanning confocal microscopy. The ratio between 
the signals of control and treated samples reflect 
the relative drug-induced change in transcript 
abundance. 



Global quantitative expression analysis at the 
protein level is currently restricted to the use of 
two-dimensional gel electrophoresis. This tech- 
nique combines separation of tissue proteins by 
isoelectric focusing in the first dimension and by 
sochum dodecyl sulfate slab gel electrophoresis- 
based molecular weight separation on the second 
orthogonal dimension (Anderson et al 199])" 
The product is a rectangular pattern of protein 
spots that are typically revealed by Coomassie 
Blue, silver or fluorescent stainine (Fic ->) 
Protein spots are identified by mass spectrometry 
following generation of peptide mass fingerprints 

TZ« < ,5 ? 3) SeqUCnCe ta ~ es (»ns et 
al-. 1996). Similar to the mRNA approach the 

ratio between the optical density of spots from 

control and treated samples are compared to 

search for treatment-related changes. 

4. Expression data analysis 

Bioinformatics forms a key element required to 
organize, analyze and store expression data from 
either source, the mRNA or the protein level The 
overall objective, once a mass of hish-quality 
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quantitative expression data has been collected is 
to visualize complex patterns of gene expression 
changes, to detect pathways and sets of genes 
tightly correlated with treatment efficacy and toxi- 
city, and to compare the effects of different sets of 
treatment (Anderson et al., 1996). As the drug 
effect database is growing, one may detect similar- 
ities and differences between the molecular finger- 
prints produced by various drugs, information 
that may be crucial to make a decision whether to 
refocus or extend the therapeutic spectrum of a 
drug candidate. 



5. Comparison of global mRNA and protein 
expression profiling 

There are several synergies and overlaps of data 
obtained by mRNA and protein expression analy- 
sis. Low abundant transcripts may not be easily 
quantified at the protein level using standard two- 
dimensional gel electrophoresis analysis and their 
detection may require prefractionation of sam- 
ples. The expression of such genes may be prefer- 
ably quantified at the mRNA level using 
techniques allowing PCR-mediated target amplifi- 
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cation. Tissue biopsy samples typically yield good 
quality of both mRNA and proteins; however the 
quality; i0 f. mRNA isolated from- 'body; fluids is 
often 'poor , due to -. the,; faster ^degradation of 
mRNA, when compared with proteins: RNA sam- 
ples from body; fluids such : as :serum or -urine , are 
often not very -meaningful', and secreted proteins 
are likely more; ^reliable ^surrogate, markers for. 
treatment efficacy and safety. Detection. of post- 
translational modifications, events often related to 
function or nonfunction of a protein, is'restricfed 
to protein i expression 'analysis and rarely can be 
predicted "by-mRNA i profiling. Information oh 
subcellular localization and -translocation of 
proteins has . to be acquired at. the level of the 
protein in combination with sample prefractiona- 
tion procedures. The . growing! evidence- of a poor 
correlation between mRNA .and protein abun- 
dance (Anderson and Seilhamer,* 1997) further 
suggests that the two approaches, mRNA and 
protein profiling, are- complementary and should 
be applied in parallel. 

6. Expression profiling and drug development . 

Understanding the mechanisms of action and 
toxicity, and being able to monitor treatment 
efficacy and safety during trials is crucial for the 
successful development of a drug. Mechanistic 
insights are essential for the interpretation of drug 
effects and enhance the chances of recognizing 
potential species specificities contributing to an 
improved risk profile in humans (Richardson et 
al., 1993; Steiner et al.. 1996b; Aicher et al.. 1998) 
The value of expression profiling further increases 
when links between treatment-induced expression 
profiles and specific pharmacological and toxic 
endpoints are established (Anderson et al 1991 
1995, 1996: Steiner et al. 1996a). Changes in gene 
expression are known to precede the manifesta- 
tion of morphological alterations, giving expres- 
sion profiling a great potential for early 
compound screening, enabling one to select drug 
candidates with wide therapeutic windows 
reflected by molecular fingerprints indicative of 
high pharmacological potency and low toxicity 
(Arce et al., 1998). In later phases of drug devel- 
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opmravsuiTdgate markers of treatment efficacy 
and toxicity can be applied to optimize the moni- 
tonng-of pre-clinical and clinical studies (Doherty 

et;al.;:!998).. ;•. . U'.- ,!; ryr.-a' - .• 



:ives' 



The basic methodology of safety evaluation has 
changed httle-during the past decades. Toxicity "in 
laboratory animals has been' evaluated primarily 
using hematological, 'clinical chemistry and 
histological parameters as' : indicators of organ 
damage. The rapid progress in genomics and prb- 
tepmics technologies ; creates J a Unique opportunity 
to r dr * mat,fca !!y improve the predictive power of 
safety assessment and toaccelerate the dru» devel- 
opment process. Application of gene and protein 
expression profiling promises to improve lead se- 
lection, resulting in the development of drug can- 
didates with higher efficacy and lower toxicity 
^.? dentlficati pn of biologically relevant surro- 
gate markers correlated with treatment efficacv 
and safety bears a. great potential to optimize the 
monitoring of. preclinical and clinical trails ^ 
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DNA array technology makes it possible to rapidly genotype individuals or quantify the expression 
of thousands of genes on a single filter or glass slide, and holds enormous potential in toxicologic 
applications. This potential led to a U.S. Environmental Protection Agency-sponsored workshop 
tided "Application of Microarrays to Toxicology" on. 7-8 January 1999 in Research Triangle Park, 
North Carolina. In addition to providing state-of-the-art information on the application of DNA or 
gene microarrays, the workshop catalyzed the formation of several collaborations, committees, and 
user's groups throughout the Research Triangle Park area and beyond. Potential application: of 
microarrays to toxicologic research and risk assessment include genome-wide expression analyses to 
identify gene-expression networks and toxicant-specific signatures that can be used to define mode 
of action, for exposure assessment, and for environmental monitoring. Arrays may also prove useful 
for monitoring genetic variability and its relationship to toxicant susceptibility in human popula- 
tions. Key words: DNA. arrays, gene arrays, microarrays, toxicology. Environ Health Perspect 
107:681-485 (1999). [Online 6 July 1999] 



Decoding the genetic blueprint is a dream that 
offers manifold returns in terms of understand- 
ing how organisms develop and function in an 
often hostile environment. With the rapid 
advances in molecular biology over the last 30 
years, the dream has come a step closer to reali- 
ty. Molecular biologists now have the ability to 
elucidate the composition of any genome. 
Indeed, almost 20 genomes have already been 
sequenced and more than 60 are currendy 
under way. Foremost among these is the 
Human Genome Mapping Project. However, 
the genomes of a number of commonly used 
laboratory species are also under intensive 
investigation, including yeast, Arabidopsis, 
maize, rice, zebra fish, mouse, rat, and dog. It 
is widely expected that the completion of such 
programs will facilitate the development of 
many powerful new techniques and approach- 
es to diagnosing and creating genetically and 
environmentally induced diseases which afflict 
mankind. However, the vast amount of data 
being generated by genome mapping will 
require new high-throughput technologies to 
investigate the function of the millions of new 
genes that are being reported. Among the most 
widely heralded of the new functional 
genomics technologies are DNA arrays, which 
represent perhaps the most anticipated new 
molecular biology technique since polymerase 
chain reaction (PCR). 

Arrays enable the study of literally thou- 
sands of genes in a single experiment. The 
potential importance of arrays is enormous and 
has been highlighted by the recent publication 
of an entire Nature Genetics supplement dedi- 
cated to the technology (/). Despite this huge 
surge of interest, DNA arrays are still little used 
and largely unproven, as demonstrated by the 
high ratio of review and press articles to actual 
data papers. Even so, the. potential they offer 



has driven venture capitalists into a frenzy of 
investment and many new companies are 
springing up to claim a share of this rapidly 
developing market. 

The U.S. Environmental Protection 
Agency (EPA) is interested in applying DNA 
array technology to ongoing toxicologic stud- 
ies. To learn more about the current state of 
the technology, the Reproductive Toxicology 
Division (RTD) of the National Health and 
Environmental Effects Research Laboratory 
(NHEERL; Research Triangle Park, NC) 
hosted a workshop on "Application of 
Microarrays to Toxicology" on 7-8 January 
1999 in Research Triangle Park, North 
Carolina. The workshop was organized by 
David Dix, Robert Kavlock, and John Rockett 
of the RTD/NHEERL. Twenty-two intra- 
mural and extramural scientists from govern- 
ment, academia, and industry shared informa- 
tion, data, and opinions on the current and 
future applications for this exciting new tech- 
nology. The workshop had more than 1 50 
attendees, including researchers, students, and 
administrators from the EPA, the National 
Institute of Environmental Health Sciences 
(NIEHS), and a number of other establish- 
ments from Research Triangle Park and 
beyond. Presentations ranged from the tech- 
nology behind array production through the 
sharing of actual experimental data and projec- 
tions on the future importance and applica- 
tions of arrays. The information contained in 
the workshop presentations should provide aid 
and insight into arrays in general and their 
application to toxicology in particular. 

Array Elements 

In the context of molecular biology, the word 
"array" is normally used to refer to a series of 
DNA or protein elements firmly attached in 



a regular pattern to some kind of supportive 
medium. DNA array is often used inter- 
changeably with gene array or microarray. 
Although not formally defined, microarray is 
generally used to describe the higher density 
arrays typically printed on glass chips. The 
DNA elements that make up DNA arrays 
can be oligonucleotides, partial gene 
sequences, or full-length cDNAs. Companies 
offering p re-made arrays that contain less 
than full-length clones normally use regions 
of the genes which are specific to that gene to 
prevent false positives arising through cross- 
hybridization. Sequence verification of 
cDNA done identity is necessary because of 
errors in identifying specific clones from 
cDNA libraries and databases. Premade 
DNA arrays printed on membranes are cur- 
rendy or imminently available for human, 
mouse, and rat. In most cases they contain 
DNA sequences representing several thou- 
sand different sequence clusters or genes as 
delineated through the National Center for 
Biotechnology Information UniGene Project 
(2). Many of these different UniGene clusters 
(putative genes) are represented only by 
expressed sequence tags (ESTs). 

Array Printing 

Arrays are typically printed on one of two 
types of support matrix. Nylon membranes 
are used by most off-the-shelf array providers 
such as Clontech Laboratories, Inc. 
(Palo Alto, CA), Genome Systems, Inc. (St. 
Louis, MO), and Research Genetics, Inc. 
(Huntsville, AL). Microarrays such as those 
produced by Affymetrix, Inc. (Santa Clara, 
CA), Incyte Pharmaceuticals, Inc. (Palo Alto, 
CA), and many do-it-yourself (DIY) arraying 
groups use glass wafers or slides. Although 
standard microscope slides may be used, they 
must be preprepared to facilitate sticking 
of the DNA to the glass. Several different 
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coatings have been successfully used, includ- 
ing siiane and lysine. The coating of slides 
can easily be carried out in the laboratory, 
but many prefer the convenience of precoated 
slides available from suppliers. 

Once the support matrix has been pre- 
pared, the DNA elements can be applied by 
several methods. Afrymetrix, Inc., has devel- 
oped a unique photolithographic technology 
for attaching oligonucleotides to glass wafers. 
More commonly, DNA is applied by either 
noncontact or contact printing. Noncontact 
printers can use thermal, solenoid, or piezoelec- 
tric technology to spray aliquots of solution 
onto the support matrix and may be used to 
produce slide or membrane-based arrays. 
Cartesian Technologies, Inc. (Irvine, CA) has 
developed nQUAD technology for use in its 
PixSys printers. The system couples a syringe 
pump with the mioosolenoid valve, a combi- 
nation that provides rapid quantitative dispens- 
ing of nanoHter volumes (down to 4.2 nL) over 
a variable volume range. A different approach 
to noncontact printing uses a solid pin and ring 
combination (Genetic MicroSystems, Inc., 
Wobum, MA). This system (Figure 1) allows a 
broader range of sample, including cell suspen- 
sions and particulates, because the printing 
head cannot be blocked up in the same way as 
a spray nozzle. Fluid transfer is controlled in 
this system primarily by the pin dimensions 
and the force of deposition, although the 
nature of the support matrix and the sample 
will also affect transfer to some degree. 

In contact printing, the pin head is dipped 
in the sample and then touched to the support 
matrix to deposit a small aliquot. Split pins 
were one of the first contaa-prinring devices 
to be reported and are the suggested format 
for DIY arrayers, as described by Brown (3). 
Split pins are small metal pins with a precise 
groove cut vertically in the middle of the pin 
rip. In this system, 1-48 split pins are posi- 
tioned in the pin-head. The split pins work by 
simple capillary action, not unlike a fountain 
pen — when the pin heads are dipped in the 
sample, liquid is drawn into the pin groove. A 
small (fixed) volume is then deposited each 
time the split pins are gently touched to 
the support matrix. Sample (100-500 pL 
depending on a variety of parameters) can be 
deposited on multiple slides before refilling is 
required, and array densities of > 2,500 
spots/cm 2 may be produced. The deposit vol- 
ume depends on the split size, sample fluidi- 
ty, and the speed of printing. Split pins are 
relatively simple to produce and can be made 
in-house if a suitable machine shop is avail- 
able. Alternatively, they can be obtained 
direcdy from companies such as TeleChem 
International, Inc. (Sunnyvale, CA). 

Irrespective of their source, printers 
should be run through a preprint sequence 
prior to producing the actual experimental 
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arrays; the first 100 or so spots of a new run 
tend to be somewhat variable. Factors effect- 
ing spot reproducibility include slide treat- 
ment homogeneity, sample differences, and 
instrument errors. Other factors that come 
into play include clean ejection of the drop 
and clogging (nQUAD printing) and 
mechanical variations and long-term alter- 
ation in print-head surface of solid and split 
pins. However, with careful preparation it is 
possible to get a coefficient of variance for 
spot reproducibility below 10%. 

One potential printing problem is sample 
carryover. Repeated washing, blotting, and 
drying (vacuum) of print pins between samples 
is normally effective at reducing sample carry- 
over to negligible amounts. Printing should 
also be carried out in a controlled environ- 
ment. Humidified chambers are available in 
which to place printers. These help prevent 
dust contamination and produce a uniform 
drying rate, which is important in determining 
spot size, quality, and reproducibility. 

In summary, although several printing 
technologies are available, none are par- 
ticularly outstanding and the bottom line 
is that they are still in a relatively early stage 
of evolution. 

Array Hybridization 

The hybridization protocol is, practically 
speaking, relatively straightforward and those 
with previous experience in blotting should 
have little difficulty. Array hybridizations 
are, in essence, reverse Southern/Northern 
blots — instead of applying a labeled probe to 
the target population of DNA/RNA, the 
labeled population is applied to the probe(s). 
With membrane-based arrays, , the control and 
treated mRNA populations are normally con- 
verted to cDNA and labeled with isotope (eg., 
33 P) in the process. These labeled populations 
are then hybridized independendy to parallel 
or serial arrays and the hybridization signal is 
detected with a phosporimager. A less com- 
monly used alternative to radioactive probes is 
enzymatic detection. The probe may be 
biotinylated, haptenylated, or have alkaline 
phospharase/horseradish peroxidase attached. 
Hybridization is detected by enzymatic reac- 
tion yielding a color, reaction (4). Differences 
in hybridization signals can be detected by eye 
or, more accurately, with the help of digital 
imaging and commercially available software. 
The labeling of the test populations for slide- 
based microarrays uses a slightly different 
approach. The probe typically consists of two 
samples of poryA* RNA (usually from a treated 
and a control population) that are converted to 
cDNA; in the process each is labeled with a 
different fluor. The independently labeled 
probes are then mixed together and hybridized 
to a single microarray slide and the resulting 
combined fluorescent signal is scanned. After 




WeO containing 
sample jo knian 





* • M M | 



Figure 1. Genetic Microsystems (Wobum, MA) pin 
ring system for printing arrays. The pin ring com- 
bination consists of a circular open ring oriented 
parallel to the sample solution, with a vertical pin 
centered over the ring. When the ring is dipped 
into a solution and lifted, it withdraws an aliquot 
of sample held by surface tension. To spot the 
sample, the pin is driven down through the ring 
and a portion of the solution is transferred to the 
bottom of the pin. The pin continues to move 
downward until the pendant drop of solution 
makes contact with the underlying surface. The 
pin is then lifted, and gravity and surface tension 
causa deposition of the spot onto the array. 
Figure from Flowers et al. {14\, with permission 
from Genetic Microsystems. 

normalization, it is possible to determine the 
ratio of fluorescent signals from a single 
hybridization of a slide-based microarray. 

cDNA derived from control and treated 
populations of RNA is most commonly 
hybridized to arrays, although subtracrive 
hybridization or differential display reactions 
may also be used. Fluorophore- or radiola- 
beled nucleotides are direcdy incorporated 
into the cDNA in the process of converting 
RNA to cDNA. Alternatively, 5' end-labeled 
primers may be used for cDNA synthesis. 
These are labeled with a fluorophore for 
direct visualization of the hybridized array. 
Alternatively, biotin or a hapten may be 
attached to the primer, in which case fluor- 
labeled streptavidin or antibody must be 
applied before a signal can be generated. The 
most commonly used fiuorophores at present 
are cyanine (Cy)3 and Cy5 (Amersham 
Pharmacia Biotech AB, Uppsala, Sweden). 
However, the relative expense of these fluo- 
rescent conjugates has driven a search for 
cheaper alternatives. Fluorescein, rhodamine, 
and Texas red have all been used, and 
companies such as Molecular Probes, Inc. 
(Eugene, OR) are developing a series of 
labeled nucleotides with a wide range of exci- 
tation and emission spectra which may prove 
to function as well as the Cy dyes. 
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Table 1. Advantages and disadvantages of different micro array scanning systems. 



Nonconfocal laser scanner 


Advantages 
Disadvantages 


Few moving parts 

Fast scanning of bright 
samples 

Less appropriate for dim 
samples 

Optical scatter can limit 
performance 


Relatively simple optics 

Low light collection efficiency 
Background artifacts not rejected 
Resolution typically low 


Small depth of focus reduces 
artifacts 

May have high light collection 
efficiency 

Small depth of focus requires 
scanning precision 



Analysis of DNA Microarrays 

Membrane-based arrays are normally analyzed 
on film or with a phosphorimager, whereas 
chip-based arrays require more specialized scan- 
ning devices. These can be divided into three 
main groups: the charge-coupled device camera 
systems, the nonconfocal laser scanners, and the 
confocal laser scanners. The advantages and dis- 
advantages of each system are listed in Table 1. 

Because a typical spot on a microarray can 
contain > 10 s molecules, it is clear that a large 
variation in signal strength may occur. 
Current scanners cannot work across this 
many orders of magnitude (4 or 5 is more typ- 
ical). However, the scanning parameters can 
normally be adjusted to collect more or less 
signal, such that two or three scans of the same 
array should permit the detection of rare and 
abundant genes. 

When a microarray is scanned, the fluores- 
cent images are captured by software normally 
included with the scanner. Several commercial 
suppliers provide additional software for quan- 
tifying array images, but the software tools are 
constantly evolving to meet the developing 
needs of researchers, and it is prudent co 
define one's own needs and clarify the exact 
capabilities of the software before its purchase. 
Issues that should be considered include chc 
following: 

• Can the software locate offset spots? 

• Can it quantitate across irregular hybridiza- 
tion signals? 

• Can the arrayed genes be programmed in for 
easy identification and location? 

• Can the software connect via the Internet to 
databases containing further information on 
the gene(s) of interest? 

One of the key issues raised at the work- 
shop was the sensitivity of microarray technol- 
ogy. Experiments by General Scanning, Inc. 
(Watertown, MA), have shown that by using 
the Cy dyes and their scanner, signal can be 
detected down to levels of < 1 fluor molecule 
per square micrometer, which translates to 
detecting a rare message at approximately one 
copy per cell or less. 

Array Applications 

Although arrays are an emerging technology 
certain to undergo improvement and 
alterarion,*they have already been applied use- 
fully to a number of model systems. Arrays are 
at their most powerful when they contain the 
entire genome of the species they are being 
used to study. For this reason, they have strong 
support among researchers utilizing yeast and 
Cacnorhabditis eUgans (5). The genomes of 
both of these species have been sequenced and, 
in the case of yeast, deposited onto arrays for 
examination of gene expression {6,7). With 
both of these species, it is relatively easy to 
perturb individual gene expression. Indeed, C 



CCD, charge-coupled device. 
From Kawasaki ( 73). 

elegans knockouts can be made simply by 
soaking the worms in an antisense solution of 
the gene to be knocked out 

By a process of systematic gene disrup- 
tion, it is now possible to examine the cause 
and effect relationships between different 
genes in these simple organisms. This kind of 
approach should help elucidate biochemical 
pathways and genetic control processes, 
deconvolute polygenic interactions, and 
define the architecture of the cellular network. 
A simple case study of how this can be 
achieved was presented by Butow [University 
of Texas Southwestern Medical Center, 
Dallas, TX (Figure 2)]. Although it is the 
phenotypic result of a single gene knockout 
that is being examined, the effect of such 
perturbation will almost always be polygenic 
Polygenic interactions will become increasing- 
ly important as researchers begin to move* 
away from single gene systems when examin- 
ing the nature of toxicologic responses to 
external stimuli. This is especially important 
in toxicology because the phenotype pro- 
duced by a given environmental insult is 
never the result of the action of a single gene; 
rather, it is a complex interaction of one or 
multiple cellular pathways. Phenomena such 
as quantitative trait (the continuous variation 
of phenotype), epistasis (the effect of alleles of 
one or more genes on the expression of other 
genes), and penetrance (proportion of indi- 
viduals of a given genotype that display a par- 
ticular phenotype) will become increasingly 
evident and important as toxicologic ts push 
toward the ultimate goal of matching the 
responses of individuals to different 
environmental stimuli. 

Analysis of the transcriptome (the expres- 
sion level of all the genes in a given cell popula- 
tion) was a use of arrays addressed by several 
speakers. Unfortunately, current gene nomen- 
clature is often confusing in that single genes 
are allocated multiple names (usually as a result 
of independent discovery by different laborato- 
ries), and there was a call for standardization of 
gene nomenclature. Nevertheless, once a tran- 
scriptome has been assembled it can then be 
transferred onto arrays and used to screen any 
chosen system. The EPA MicroArray 
Consortium (EPAMAQ is assembling testes 



transcriptomes for human, rat, and mouse. In a 
slightly different approach, Nuwaysir et al. (8) 
describes how the N1EHS assembled what is 
effectively a "toxicological transcriptome" — a 
library of human and mouse genes that have 
previously been proven or implicated in 
responses to toxicologic insults. Clontech 
Laboratories, Inc. (Palo Alto, CA), has begun a 
similar process by developing stress/toxicology 
filter arrays of rat, mouse, and human genes. 
Thus, rather than being tissue or cell specific, 
these stress/toxicology arrays can be used across 
a variety of model systems to look for alter- 
ations in the expression of toxicologically 
important genes and define the new field of 
toxicogenomia. The potential to identify toxi- 
cant families based on tissue- or cell-specific 
gene expression could revolutionize drug test- 
ing. These molecular signatures or fingerprints 
could not only point to the possible 
toxicity/carcinogenicity of newly discovered 
compounds (Figure 3), but also aid in elucidat- 
ing their mechanism of action through identifi- 
cation of gene expression networks. By exten- 
sion, such signatures could provide easily iden- 
tifiable biomarkers to assess the degree, rime, 
and nature of exposure. 

DNA arrays are primarily a tool for exam- 
ining differential gene expression in a given 
model. In this context they are referred to as 
dosed systems because they lack the ability of 
other differential expression technologies, eg., 
differential display and subtracrive hybridiza- 
tion, to detect previously unknown genes not 
present on the array. This would appear to 
limit the power of DNA arrays to the imagina- 
tions and preconceptions of the researcher in 
selecting genes previously characterized and 
thought to be involved in the model system. 
However, the various genome sequencing pro- 
jects have created a new category of 
sequence— the EST — that has partially molli- 
fied this deficiency. ESTs are cDNAs expressed 
in a given tissue that, although they may share 
some degree of sequence similarity to previous- 
ly characterized genes, have not been assigned 
specific genetic identity. By incorporating EST 
clones into an array, it is possible to monitor 
the expression of these unknown genes. This 
can enable the identification of previously 
uncharacterized genes that may have biologic 
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significance in the model system. Filter arrays 
from Research Genetics and slide arrays from 
Incyte Pharmaceuticals both incorporate large 
numbers of ESTs from a variety of species. 

A further use of microarrays is the identifi- 
cation of single nucleotide polymorphisms 
(SNPs). These genomic variations are abun- 
dant — they occur approximately every 1 kb or 
so — and are the basis of restriction fragment 
length polymorphism analysis used in forensic 
analysis. Affymetrix, Inc, designed chips that 
contain multiple repeats of the same gene 
sequence. Each position is present with all four 
possible bases. After the hybridization of the 
sample, the degree of hybridization to the dif- 
ferent sequences can be measured and the exact 
sequence of the target gene deduced. SNPs are 
thought to be of vital importance in drug 
metabolism and toxicology. For example, sin- 
gle base differences in the regulatory region or 
active site of some genes can account for huge 
difTerences. in the activity of that gene. Such 
SNPs are thought to explain why some people 
are able to metabolize certain xenobiorics bet- 
ter than others. Thus, arrays provide a further 
tool for the toxicologist investigating the 
nature of susceptible subpopulations and toxi- 
cologic response. 

There are still many wrinkles to be ironed 
out before arrays become a standard tool for 
toxicologists. The main issues raised at the 
workshop by those with hands-on experience 
were the following: 

* Expense: the cost of purchasing/contracting 
this technology is still too great for many 
individual laboratories. 




Figure 2. Potential effects of gene knockout within 
positively and negatively regulated gene expression 
networks. is limiting in wild type for expression of 
if [A) A simple, two-component linear regulatory 
network operating on gene ^ where /, is a positive 
effector of ^ and j n is either a positive or negative 
effector of / v This network could be deduced by 
examining the consequence of {£) deleting j n on the 
expression of /, and ^ where the expression of ^ 
would be decreased or increased depending on 
whether j n was a positive or negative regulator. 
These and other connected components of even 
greater complexity could be revealed by genome- 
wide expression analysis. From Butow ( 75). 



► Clones: the logistics of identifying, obtaining, 
and maintaining a set of nonredundant, non- 
contaminated, sequence-verified, species/cell/ 
tissue/field-specific clones. 

' Use of inbred strains: where whole-organism 
models are being used, the use of inbred 
strains is important to reduce the potentially 
confusing effects of the individual variation 
typically seen in outbred populations. 
Probe: the need for relatively large amounts 
of RNA, which limits the type of sample 
(e^, biopsy) that can be used. Also, different 
RNA extraction methods can give different 
results. 

Specificity: the ability to discriminate accu- 
rately between closely related genes (eg., the 
cytochrome p450 family) and splice variants, 
t Quantitation: the quantitation of gene 
expression using gene arrays is still open to 
debate. One reason for this is the different 
incorporation of the labeling dyes. However, 
the main difficulty lies in knowing what to 
normalize against. One option is to include a 
large number of so-called housekeeping genes 
in the array. However, the expression of these 
genes often change depending on the tissue 
and the toxicant, so it is necessary to charac- 
terize the expression of these genes in the 
model system before utilizing them. This is 
clearly not a viable option when screening 
multiple new compounds. A second option 
is to include on the array genes from a nonre- 
lated species (eg., a plant gene on an animal 
array) and to spike the probe with synthetic 
RNA(s) complementary to the gene(s). 
Reproducibility: this is sometimes question- 
able, and a figure of approximately two or 
three repeats was used as the minimum num- 
ber required to confirm initial findings. 



Test cfsjupocsd I 



Toad compound 2 



Eftdocrtfto disrupts 



Toxicant family 



Ksavy metals J 



Oxidant stressors 



Potyeyefic aromatic hydrocarbons 



Again, however, most people advocated the 
use of Northern blots or reverse transcriptase 
PCR to confirm findings. 

• Sensitivity: concerns were voiced about the 
number of target molecules that must be pre- 
sent in a sample for them to be detected on 
the array. 

• Efficiency: reproducible identification of 1.5- 
to 2-fold differences in expression was report- 
ed, although the number of genes that 
undergo this level of change and remain 
undetected is open to debate. It is important 
that this level of detection be ultimately 
achieved because it is commonly perceived 
that some important transcription factors 
and their regulators respond at such low lev- 
els. In most cases, 3- to 5-fbld was the mini- 
mum change that most were happy to 
accept 

• Bioinformarics: perhaps the greatest concern 
was how to accurately interpret the data with 
the greatest accuracy and efficiency. The 
biggest headache is trying to identify net- 
works of gene expression that are common to 
different treatments or doses. The amount of 
data from a single experiment is huge. It may 
be that, in the future, several groups individ- 
ually equipped with specialized software algo- 
rithms for studying their favorite genes or 
gene systems will be able to share the same 
hybridized chips. Thus, arrays could usher in 
a new perspective on collaboration and the 
sharing of data. 

EPAMAC 

Perhaps the main reason most scientists are 
unable to use array technology is the high cost 
involved, whether buying ofF-the-shelf mem- 
branes, using contract printing services, or 




Figure 1 Gene expression profiles — also called fingerprints or signatures — of known toxicants or toxi- 
cant families may, in the future, be used to identify the potential toxicity of new drugs, etc. In this exam- 
ple, the genetic signature of test compound 1 is identical to that of known peroxisome pro I iterators, 
whereas that of test compound 2 does not match any known toxicant family. Based on these results, test 
cpmpound 2 would be retained for further testing and test compound 1 would be eliminated. 
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producing chips in-house. In view of this, 
researchers at the RTD/NHEERL initiated 
the EPAMAC. This consortium brings 
together scientists from the EPA and a num- 
ber of extramural labs with the aim of devel- 
oping microarray capability through the shar- 
ing of resources and data. EPAMAC 
researchers are primarily interested in the 
developmental and toxicologic changes seen 
in testicular and breast tissue, and a portion 
of the workshop was set aside for EPAMAC 
members to share their ideas on how the 
experimental application of microarrays could 
facilitate their research. One of the central 
areas of interest to EPAMAC members is the 
effect of xenobiotics on male fertility and 
reproductive health. Of greatest concern is 
the effect of exposure during critical periods 
of development and germ cell differentiation 
0°), and how this may compromise sperm- 
counts and quality following sexual matura- 
tion (10). As well as spermatogenic tissue, 
there is also interest in how residual mRNA 
found in mature sperm (II) could be used as 
an indicator of previous xenobiotic effects (it 
is easier to obtain a semen sample than a tes- 
ticular biopsy). Arrays will be used to examine 
and compare the effect of exposure to heat 
and chemicals in testicular and epididymal 
gene expression profiles, with the aim of 
establishing relationships/associations 
between changes in developmental landmarks 
and the effects on sperm count and quality. 
Cluster, pattern, and other analysis of such 
data should help identify hidden relationships 
between genes that may reveal potential 
mechanisms of action and uncover roles for 
genes with unknown functions. 

Summary 

The full impact of DNA arrays may not be 
seen for several years, but the interest shown at 
this regional workshop indicates the high level 
of interest that they foster. Apart from educat- 
ing and advertising the various technologies in 
this field, this workshop brought together a 
number of researchers from the Research 
Triangle Park area who are already using DNA 
arrays. The interest in sharing ideas and experi- 
ences led to the initiation of a Triangle array 
user's group. 
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Array technology is still in its infancy. This 
meatus that the hardware is still improving and 
there! is no current consensus for standard pro- 
cedures, quantitation, and interpretation. 
Consistency in spotting and scanning arrays is 
not yet optimized, and this is one of the most 
critical requirements of any experiment. In 
addition, one of the dark regions of array tech- 
nology — strife in the courts over who owns 
wharjportions of it — has further muddled the 
future and is a potential barrier toward the 
development of consensus procedures. 

Perhaps the greatest hurdle for the applica- 
tion of arrays is the actual interpretation of 
data. No specialists in bioinformatics attended 
the workshop, largely because they are rare and 
because as yet no one seems clear on the best 
method of approaching data analysis and inter- 
pretation. Cross-referencing results from mul- 
tiple ^experiments (time, dose, repeats, different 
animals, different species) to identify common- 
ly expressed genes is a great challenge. In most 
cases; we are still a long way from understand- 
ing how the "expression of gene A' is related to 
the Repression of gene Y, and ordering gene 
expression to delineate causal relationships. 

To the ordinary scientist in the typical lab- 
oratory, however, the most immediate prob- 
lem is a lack of affordable instrumentation. 
One) can purchase premade membranes at 
relatively affordable prices. Although these 
may! be useful in identifying individual genes 
to pursue in more detail using other methods, 
the numbers that would be required for even a 
small routine toxicology experiment prohibit 
this as a truly viable approach. For the toxicol- 
ogisij, there is a need to carry out multiple 
experiments — dose responses, time curves, 
multiple animals, and repeats. Glass-based 
DNA arrays are most attractive in this context 
becajise they can be prepared in large batches 
from the same DNA source and accommo- 
date control and treated samples on the same 
chip! Another problem with current off-the- 
shelf] arrays is that they often do not contain 
one pr more of the particular genes a group is 
interested in. One alternative is to obtain 
r produce a set of custom clones and 
contract printing of membranes or slides 
out by a company such as Genomic 
Solutions, Inc. (Ann Arbor, MI). This approach 
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is less expensive than laying out capital for 
one's own entire system, although at some 
point it might make economic sense to print 
one's own arrays. 

Finally, DNA arrays are currendy a team 
effort. They are a technology that uses a wide 
range of skills including engineering, statistics, 
molecular biology, chemistry, and bioinfor- 
matics. Because most individuals are skilled in 
only one or perhaps two of these areas, it 
appears that success with arrays may be best 
expected by teams of collaborators consisting 
of individuals having each of these skills. 

Those considering array applications may 
be amused or goaded on by the following 
quote from Fortune magazine (12); 

Microprocessors have reshaped our economy, . 
spawned vast fortunes and changed the way we live. 
Gene chips could be even bigger. 

Although this comment may have been 
designed to excite the imagination rather than 
accurately reflect the truth, it is fair to say that 
the age of functional genomics is upon us. 
DNA arrays look set to be an important tool in 
this new age of biotechnology and will likely 
contribute answers to some of toxicology's 
most fundamental questions. 
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You car. see -he list of clones that we have on our 12:-*. chip at 
ht t? : nanus 1 .r.iehs . r.ih . gov saps • guest ' clcnesrch . cf r. 

We selectee a subset of genes (2000K) that we believed critical tr to:-: 
response and basic cellular processes and added a set cf clones ar.d £r= - - 
this. We have included a set of control genes (80-> that were selected 
the KHGRI because they did not change across a large sez of array 
experiments. However, we have found that some of these genes" chance 
signf icar.tly after tox treatments and are in the process cf loofcir.c a" ---e 
variation of each of these 80* genes across our experiments. 
Our chips are constantly changing and being updated and we hope that cur 
data will lead us to what the toxchip should really be. 
I hope this answers your question. 
Cindy Afshari 
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> Sent: Monday, June 26, 2000 8:52 PH 
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> Dear Dr. Afshari, 
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> Since I have not yet had a response from Bill Grigg, perhaps he was not 

> the right person to contact. 
> 

> Can you help me in this matter? I don't need to know the sequences, 

> necessarily, but I would like very much to know what types of sequences 

> are being used, e.g., GPCRs (more specific?), ion channels, etc' 
> 

> Diana Hamlet-Cox 
> 

> Original Message 

> Subject : Toxicology Chip 

> Daze: Hon, 19 Jun 2000 18:31:48 -0700 

> from; Diana. Hamlet -Cox <dianahc0incyte. com> 

> Organization: Incyte Pharmaceuticals 

> To: griggGniehs.nih.gov 
> 

> Dear Colleague: 
> 

> I am doing literature research on the use of expressed aenes as 

> pharmacotoxicology markers, and found the Press Release' dated February 

> 29, 2000 regarding the work of the NIZHS in this area. I would like to 

> know if there is a resource I can access (or you could provide?) that 

> would give me a list of the 12,000 genes that are on your Human ToxChip 

> Microarray. In particular, I am interested in the criteria used to 

> select sequences for the ToxChip, including any control sequences 

> included in the microarray. 
> 

> Thank you for your assistance in this request. 
> 

> Diana Hamlet-Cox, Ph.D. 

> Incyte Genomics, Inc. 
> 

> — 
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Proteomics: a major new 
technology for the drug 
discovery process 

Martin J. Page, Bob Amess, Christian Rohlff, Colin Stubberfield 
and Raj Parekh 



Proteomics is a new enabling technology that is being 
integrated into the drug discovery process. This will 
facilitate the systematic analysis of proteins across any 
biological system or disease, forwarding new targets 
and information on mode of action, toxicology and sur- 
rogate markers. Proteomics is highly complementary to 
genomic approaches in the drug discovery process and, 
for the first time, offers scientists the ability to integrate 
information from the genome, expressed mRNAs, their 
respective proteins and subcellular localization. It is ex- 
pected that this will lead to important new insights into 
disease mechanisms and improved drug discovery 
strategies to produce novel therapeutics. 



Among the major pharmaceutical and biotechnol- 
ogy companies, it is clearly recognized that the 
business of modern drug discovery is a highly 
competitive process. All of the many steps in- 
volved are inherently complex, and each can involve a 
high risk of attrition. The players in this business strive 
continuously to optimize and streamline the process; each 
seeking to gain an advantage at every step by attempting 
to make informed decisions at the earliest stage possible. 
The desired outcome is to accelerate as many key activities 
in the drug discovery process as possible. This should pro- 



duce a new generation of robust drugs that offer a high 
probability of success and reach the clinic and market 
ahead of the competition. 

There has been noticeable emphasis over recent years 
for companies to aggressively review and refine their 
strategies to discover new drugs. Central to this has been 
the introduction and implementation of cutting-edge 
technologies. Most, if not all, companies have now inte- 
grated key technology platforms that incorporate gen- 
omics, mRNA expression analysis, relational databases, 
high-throughput robotics, combinatorial chemistry and 
powerful bioinformatics. Although it is still early days to 
quantify the real impact of these platforms in clinical and 
commercial terms, expectations are high, and it is widely 
accepted that significant benefits will be forthcoming. This 
is largely based on data obtained during preclinical studies 
where the genomic 1 * 2 and microarray 3 ' 4 technologies have 
already proved their value- 
However, there are several noteworthy outcomes that re- 
sult from this. Many comments are voiced that scientists 
armed with these technologies are now commonly faced 
with data overload. Thus, in some instances, rather than 
facilitating the decision process, the accumulation of more 
complex data points, many with unknown consequences, 
can seem to hinder the process. Also, most drug compa- 
nies have simultaneously incorporated very similar compo- 
nents of the new technology platforms, the consequence 
being that it is becoming difficult yet again to determine 
where a clear competitive advantage will arise. Finally, in 
recent years, largely as a result of the accessibility of the 
technologies, there has been an overwhelming emphasis 
placed on genomic and mRNA data rather than on protein 
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Sample 2D gels and Curation and Differential analysis Mass spectrometry 

imaging interrogation (Proteograph™) and annotation 




Figure 1. Steps involved in analysing a biological sample by proteomics. MCI, molecular cluster index. 



analysis. It is important to remember that proteins dictate 
biological phenotype - whether it is normal or diseased - 
and are the direct targets for most drugs. 

Proteomics: new technology for 
the analysis of proteins 

It is now timely to recognize that complementary technol- 
ogy in the form of high-throughput analysis of the total 
protein repertoire of chosen biological samples, namely 
proteomics, is poised to add a new and important dimen- 
sion to drug discovery. In a similar fashion to genomics, 
which aims to profile every gene expressed in a cell, pro- 
teomics seeks to profile every protein that is expressed 5-7 . 
However, there is added information, since proteomics can 
also be used to identify the post-translational modifications 
of proteins 8 , which can have profound effects on bio- 
logical function, and their cellular localization. Importantly, 
proteomics is a technology that integrates the significant 
advances in two-dimensional (2D) electrophoretic separa- 
tion of proteins, mass spectrometry and bioinformatics. 
With these advances it is now possible to consistently de- 
rive proteomes that are highly reproducible and suitable 
for interrogation using advanced bioinformatic tools. 

There are many variations whereby different laboratories 
operate proteomics. For the purpose of this review, the 



process used at Oxford GlycoSciences (OGS), which uses 
an industrial-scale operation that is integral to its drug dis- 
covery work, will be described. The individual steps of 
this process, where up to 1000 2D gels can be run and 
analysed per week, are summarized in Fig. 1. The incom- 
ing samples are bar coded and all information relevant to 
the sample is logged into a Laboratory Information 
Management System (UMS) database. There can be a wide 
range in the type of samples processed, as applicable to 
individual steps in the drug discovery pipeline, and these 
will be mentioned later. The samples are separated accord- 
ing to their charge (pi) in the first dimension, using iso- 
electric focusing, followed by size (MW) using SDS-PAGE 
in the second dimension. Many modifications have been 
made to these steps to improve handling, throughput and 
reproducibility. The separated proteins are then stained 
with fluorescent dyes which are significantly more sensi- 
tive in detection than standard silver methods and have a 
broader dynamic range. The image of the displayed pro- 
teins obtained is referred to as the proteome, and is digi- 
tally scanned into databases using proprietary software 
called ROSETTA™. The images are subsequently curated, 
which begins with the removal of any artefacts, cropping 
and the placement of pI/MW landmarks. The images from 
replicate images are then aligned and matched to one 



56 



DDT Vol. 4, No. 2 February 1999 



research focus 



another to generate a synthetic composite image. This is 
an important step, as the proteome is a dynamic situation, 
and it captures the biological variation that occurs, such 
that even orphan proteins are still incorporated into the 
analysis. 

By means of illustration, Fig. 1 shows the process 
whereby proteomes are generated from normal and dis- 
ease samples and how differentially expressed proteins are 
identified. The potential of this type of analysis is tremen- 
dous. For example, from a mammalian cell sample, in ex- 
cess of 2000 proteins can typically be resolved within the 
proteome. The quality of this is shown in Fig. 2, which 
shows representative proteomes from three diverse bio- 
logical sources: human serum, the pathogenic fungus 
Candida albicans and the human hepatoma cell line 
Huh7. 

Use of proteomics to identify 
disease specific proteins 

In most cases, the drug discovery process is initiated by 
the identification of a novel candidate target - almost al- 
ways a protein - that is believed to be instrumental in the 
disease process. To date, there is a variety of means 
whereby drug targets have been forthcoming. These in- 
clude molecular, cellular and genomic approaches, mostly 
centred upon DNA and mRNA analysis. The gene in ques- 
tion is isolated, and expression and characterization of its 
coded protein product - i.e. the drug target - is invariably 
a secondary event. 

With the proteomic approach, the starting point is at the 
other end of the 'telescope'. Here there is direct and im- 



mediate comparison of the proteomes from paired normal 
and disease materials. Examples of these pairs are: (1) pu- 
rified epithelial cell populations derived from human 
breast tumours, matched to purified normal populations of 
human breast epithelial cells, and (2) the invading patho- 
genic hyphal form of C. albicans, matched to the non- 
invading yeast form of C. albicans. When the proteome 
images from each pair are aligned, the Proteograph™ soft- 
ware is able to rapidly identify those proteins (each refer- 
enced as having a unique molecular cluster index, or MCI) 
that are either unique, or those that are differentially ex- 
pressed. Thus, the Proteograph output from this analysis is 
both qualitative and quantitative. 

Proteograph analysis for a particular study can also be 
undertaken on any number of samples. For example, one 
might compare anything from a few to several hundred 
preparations or samples, each from a normal and disease 
counterpart, and have these analysed in a single 
Proteograph study. In this way, it is possible to assign 
strong statistical confidence to the data and in some in- 
stances to identify specific subpopulations within the input 
biological sources. This feature will become increasingly 
significant in the near future, and there is a clear synergy 
here whereby proteomics can work closely with pharma- 
cogenomic approaches to stratify patient populations and 
achieve effective targeted care for the patient. Whatever 
the source of the materials, the net output of Proteograph 
analysis is immediate identification of disease specific pro- 
teins. This is shown in Fig. 3, which shows the results of 
a proteograph obtained by comparing untreated human 
hepatoma cells with cells following exposure to a clinical 
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Figure 2. Representative proteomes obtained from (a) human serum, (b) the pathogenic fungus Candida albicans 
and (c) the human hepatoma cell HneHuh7. 
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Foregrounds: Huh7 cells treated with 5FU 

Backgrounds: Huh7 cells untreated 

■■■■■■ Upregulated in Huh7 cells treated with 5FU 

with respect to untreated Huh7 cells 
■■MHi Down regulated in Huh7 cells treated with 5FU 

with respect to untreated Huh7 cells 
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Figure 3- Table of differential protein expression 
profiles, referred to as a Rosetta Proteograph™, 
between Hub 7 cells with and without the cytotoxic 
agent 5-FU. Bars are quantized and do not represent 
exact fold change values. 



cytotoxic agent. In this instance, only the top 20 differen- 
tially expressed MCIs are shown, but the readout would 
normally extend to a defined cut-off value, typically a two- 
fold or greater difference in expression levels, determined 
by the user. 

In a typical analysis involving disease and normal mam- 
malian material, in which each proteome would have 
-2000 protein features each assigned an MCI, the proteo- 
graph might identify somewhere in the region of 50-300 
MCIs that are unique or differentially expressed. To capi- 
talize rapidly on these data, at OGS a high-throughput 



mass spectrometry facility coupled to advanced databases 
to annotate these MCIs as individual proteins is applied. As 
these are all disease specific proteins, each could represent 
a novel target and/or a novel disease marker. The process 
becomes even more powerful when a panel of features, 
rather than individual features, are assigned. The relevance 
of this is apparent when one considers that most diseases, 
if not all, are multifactorial in nature and arise from poly- 
genic changes. Rather than analysing events in isolation, 
the ability to examine hundreds or thousands of events 
simultaneously, as shown by proteomics, can offer real 
advantages. 

Identification and assignment of candidate targets 
The rapid identification and assignment of candidate tar- 
gets and markers represents a huge challenge, but this has 
been greatly facilitated by combining the recent advances 
made in proteomics and analytical mass spectrometry 9 . 
Using automated procedures it is now possible to annotate 
proteins present in femtomole quantities, which would de- 
pict the low abundance class of proteins. The process of 
annotation is similarly aided by the quality and richness of 
the sequence specific databases that are currently avail- 
able, both in the public domain and in the private sector 
(e.g. those supplied by Incyte Pharmaceuticals). In this re- 
spect, the advances in proteomics have benefited consider- 
ably from the breakthroughs achieved with genomics. 

From an application perspective, cancer studies provide a 
good opportunity whereby proteomics can be instrumental 
in identifying disease specific proteins, because it is often 
feasible to obtain normal and diseased tissue from the same 
patient. For example, proteomic studies have been re- 
ported on neuroblastomas 10 , human breast proteins from 
normal and tumour sources 1 1 ~ 13 , lung tumours 14 , colon tu- 
mours 15 and bladder tumours 16 . There are also proteomic 
studies reported within the cardiovascular therapeutic area, 
in which disease or response proteins are identified 1718 . 

Genomic microarray analysis can similarly identify 
unique species or clusters of mRNAs that are disease spe- 
cific. However, in some instances, there is a clear lack of 
correlation between the levels of a specific mRNA and its 
corresponding protein (Ref. 19, Gypi, S.P. et al, submit- 
ted). This has now been noted by many investigators and 
reaffirms that post-transcriplional events, including protein 
stability, protein modification (such as phosphorylation, 
glycosylation, acylation and methylation) and cell localiz- 
ation, can constitute major regulatory steps. Proteomic 
analysis captures all of these steps and can therefore pro- 
vide unique and valuable information independent from, 
or complementary to, genomic data. 
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Proteomics for target validation and signal transduc- 
tion studies 

The identification of disease specific proteins alone is in- 
sufficient to begin a drug screening process. It is critical to 
assign function and validation to these proteins by con- 
firming they are indeed pivotal in the disease process. 
These studies need to encompass both gain- and loss-of- 
function analyses. This would determine whether the activity 
of a candidate target (an enzyme, for example), eliminated 
by molecular/cellular techniques, could reverse a disease 
phenotype. If this happened, then the investigator would 
have increased confidence that a small-molecule inhibitor 
against the target would also have a similar effect. The 
proposal of candidate drug targets is often not a difficult 
process, but validating them is another matter. Validation 
represents a major bottleneck where the wrong decision 
can have serious consequences 20 . 

Proteomics can be used to evaluate the role of a chosen 
target protein in signal transduction cascades directly rel- 
evant to the disease. In this manner, valuable information 
is forthcoming on the signalling pathways that are per- 
turbed by a target protein and how they might be cor- 
rected by appropriate therapeutics. Techniques that are 
well established in one-dimensional protein studies to in- 
vestigate signalling pathways, such as western blotting 
and immunoprecipitation, are highly suited to proteomic 
applications. For example, the proteomes obtained can be 
blotted onto membranes and probed with antibodies 
against the target protein or related signalling mol- 
ecules 21 " 23 . Because proteomics can resolve >2000 pro- 
teins on a single gel, it is possible to derive important 
information on specific isoforms (such as glycosylated or 
phosphorylated variants) of signalling molecules. This will 
result in characterization of how they are altered in the 
disease process. Western immunoblotting techniques 
using high-affinity antibodies will typically identify pro- 
teins present at -10 copies per cell (-1.7 fmol); this is in 
contrast to the best fluorescent dyes currently available 
that are limited to imaging proteins at 1000 or more 
copies per cell. The level of sensitivity derived by these 
applications will greatly facilitate interpretation of com- 
plex signalling pathways and contribute significantly to 
validation of the target under study. 

immunoprecipitation studies 

Similarly, immunoprecipitation studies are another useful 
way to exploit the resolving power of proteomics 24,25 . In 
this instance, very large quantities of protein (e.g. several 
milligrams) can be subjected to incubation with antibodies 
against chosen signalling molecules. This allows high-affin- 



ity capture of these proteins, which can subsequently be 
e luted and electxophoresed on a 2D gel to provide a high- 
resolution proteome of a specific subset of proteins. 
Detection by blot analysis allows the identification of ex- 
tremely small amounts of defined signalling molecules. 
Again, the different isoforms of even very low abundance 
proteins can be seen, and, very importantly, the technique 
allows the investigator to identify multiprotein complexes 
or other proteins that co-precipitate with the target protein. 
These coassociating proteins frequently represent sig- 
nalling partners for the target protein, and their identifi- 
cation by mass spectrometry can lead to invaluable infor- 
mation on the signalling processes involved. 

The depth of signal transduction analysis offered by 
proteomics, and the utility for target validation studies, 
can be extended even further by applying cell fraction- 
ation studies 26-28 . By purifying subcellular fractions, such 
as membrane, nuclear, organelle and cytosolic, it is possi- 
ble to assign a localization to proteins of interest and to 
follow their trafficking in a ceil. Enrichment of these frac- 
tions will also allow much higher representation of low 
abundance proteins on the proteome. Their detection by 
fluorescent dyes or immunoblot techniques will lead to 
the identification of proteins in the range of 1-10 copies 
per cell, putting the sensitivity on a par with genomic 
approaches. 

These signal transduction analyses can be of additional 
value in experiments where inhibitors derived from a 
screening programme against the target are being evalu- 
ated for their potency and selectivity. The inhibitors can 
encompass small molecules, antisense nucleic acid con- 
structs, dominant-negative proteins, or neutralizing anti- 
bodies microinjected into cells. In each case, proteome 
analysis can provide unique data in support of validation 
studies for a chosen candidate drug target. 

Proteomics and drug mode-of-action studies 

Once a validated target is committed to a screening regi- 
men to identify and advance a lead molecule, it is impor- 
tant to confirm that the efficacy of die inhibitor is through 
the expected mechanism. Such mode-of-action studies are 
usually tackled by various cell biological and biochemical 
methods. Proteomics can also be usefully applied to these 
studies and this is illustrated below by describing data ob- 
tained with OGT719- This is a novel galactosyl derivative of 
the cytotoxic agent 5-fluorouracil (5-FU), which is currendy 
being developed by OGS for the treatment of hepatocel- 
lular carcinoma and colorectal metastases localized 
in the liver. The premise underpinning the design and ra- 
tionale of OGT719 was to derive a 5-FU prodrug capable 
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Figure 4. Features that are specifically up- or downregulated in Huh 7 cells by either 5-fluorouracil (5-FU) or 
OGT719: (a) elongation factor la2, (b) novel (three peptides by MS-MS) and (c) a-subunit of prolyl-4-hydroxylase. 
Arrows indicate up- or downregulated. 



of targeting, and being retained in, cells bearing the asialo- 
glycoprotein receptor (ASGP-r), including hepatocytes 29 , 
hepatoma Huh7 cells 30 and some colorectal tumour cells 31 . 
The growth of the human hepatoma cell line Huh7 is in- 
hibited by 5-FU or by OGT719- If the inhibition by 
OGT719 were the result of uptake and conversion to 5-FU 
as the active component, then it would be expected that 
Huh7 cells would show similar proteome profiles follow- 
ing exposure to either drug. 

To examine these possibilities, we conducted an experi- 
ment taking samples of Huh7 cells that had been treated 
with IC 50 doses of either OGT719 or 5-FU. Total cell lysates 
were prepared and taken through 2D electrophoresis, 
fluorescence staining, digital imaging and Proteograph 
analysis. To facilitate the interpretation of the data across 
all of the 2291 features seen on the proteomes, drug- 
induced protein changes of fivefold or greater, identified 
by the Proteograph, were analysed further. Interestingly, 
from this analysis 19 identical proteins were changed five- 
fold or more by both drugs, strongly suggesting similarities 
in the mode of action for these two compounds. 

Thus, from very complex data involving >2000 protein 
features, using proteomics it is possible to analyse quanti- 
tatively and qualitatively each protein during its exposure 
to drugs. The biologist is now able to focus a series of fur- 
ther studies specifically on an enriched subset of proteins. 



Figure 4 shows highlighted examples of the selected areas 
of the proteome where some of these identified proteins in 
the above study are altered in response to either or both 
drugs. 

Several of the proteins identified above as being modu- 
lated similarly by 5-FU or OGT719 in Huh7 cells were sub- 
jected to tandem mass-spectrometric analysis for anno- 
tation. Some of these, such as the nuclear ribosomal 
RNA-binding protein 32 , can be placed into pyrimidine 
pathways or related cell cycle/growth biochemical path- 
ways in which 5-FU is known to act. 

To attribute further significance to the proteome mode- 
of-action studies with OGT719, another cell line, the rat 
sarcoma HSN, was used. Growth of these ceils is inhibited 
by 5-FU, but they are completely refractory to OGT719; 
notably they lack the ASGP-r, which might explain this 
finding (unpublished). For our proteome studies, HSN 
cells were treated with 5-FU or OGT719 over a time course 
of one, two and four days. At each time point, cells were 
harvested and processed to derive proteomes and 
Proteographs. As before, we purposely focused on those 
proteins that increased or decreased by fivefold or more. 
In this instance, there were no proteins co-modulated by 
the two drugs. This is perhaps to be expected, given that 
the HSN cells are killed by 5-FU and yet are refractory to 
OGT719- 
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Clear potential 

The above is just an example of how proteomics can be 
used to address the mode of action of anticancer drugs. 
The potential of this approach is clear, and one can envis- 
age situations where it will be profitable to compare the 
proteomes of cells in which the drug target has been elimi- 
nated by molecular knockout techniques, or with small- 
molecule inhibitors believed to act specifically on the same 
target. In addition to using proteomics to examine the ac- 
tion of drugs, it is also possible to use this approach to 
gauge the extent of nonspecific effects that might eventu- 
ally lead to toxicity. For instance, in the example used 
above with HSN cells treated with OGT719, although cell 
growth was not affected, the levels of several specific pro- 
teins were changed. Further investigation of these proteins 
and the signalling pathways in which they are involved 
could be illuminating in predicting the likelihood or other- 
wise of long-term toxicity. 

Use of proteomics in formal drug 
toxicology studies 

A drug discovery programme at the stage where leads 
have been identified and mode-of-action studies are ad- 
vanced, will proceed to investigate the pharmacokinetic 
and toxicology profile of those agents. These two param- 
eters are of major importance in the drug discovery 
process, and many agents that have looked highly promis- 
ing from in vitro studies have subsequently failed because 
of insurmountable pharmacokinetic and/or toxicity prob- 
lems in vivo. Whereas the pharmacokinetic properties of a 
molecule can now be characterized quickly and accu- 
rately, toxicity studies are typically much longer and more 
demanding in their interpretation. 

The ability to achieve fast and accurate predictions of 
toxicity within an in vivo setting would represent a big 
step forward in accelerating any drug discovery pro- 
gramme. Toxicity from a drug can be manifested in any 
organ. However, because the liver and kidney are the 
major sites in the body responsible for metabolism and 
elimination of most drugs, it is informative to examine 
these particular organs in detail to provide early indi- 
cations about events that might result in toxicity. 

The basis for most xenobiotic metabolizing activity is to 
increase the hydrophilicity of the compound and so facili- 
tate its removal from the body. Most drugs are metabo- 
lized in the liver via the cytochrome P450 family of en- 
zymes, which are known to comprise a total of -200 
different members 33 - 3 * 1 , encompassing a wide array of 
overlapping specificities for different substrates. In addi- 
tion to clearance, they also play a major role in metabo- 




lism that can lead to the production and removal of toxic 
species, and in some instances it is possible to correlate 
the ability or failure to remove such a toxin with a specific 
P450 or subgroup. 

Unique P450 profiles 

Each individual person will have a slighdy different P450 
profile, largely from polymorphisms and changes in ex- 
pression levels, although other genetic and environmental 
factors aside from P450 also need to be taken into consid- 
eration. A significant amount of research is currently 
being directed towards this field - known as pharmacoge- 
nomics - with the aim of predicting how a patient will re- 
spond to a drug, as determined by their genetic make- 
U p3V37 marked variation of individuals in their ability 
to clear a compound can be one of the key factors in de- 
ciding the overall pharmacokinetic profile of a drug. Not 
only will this have a bearing on the likelihood of a patient 
responding to a treatment, but it will also be a factor in 
determining the possibility of their experiencing an ad- 
verse effect. 

Many pharmaceutical companies are already employing 
genomic approaches, involving P450 measurements, as a 
key step in their assessment of the toxicological profile of 
a candidate drug and therefore of its suitability, or other- 
wise, to be considered for human clinical trials. There are 
limits to this approach, however. Whereas the P450 mRNA 
profiling can predict with some accuracy the likely meta- 
bolic fate of a drug, it will not provide information on 
whether the metabolites would subsequendy lead to tox- 
icity. Besides the patient-to-patient differences in steady- 
state levels of the P450s, there are also characteristic induc- 
tion responses of these enzymes to some drugs. Moreover, 
as there can be some doubt over the correlation of mRNA 
levels and the corresponding protein levels, there is scope 
for misinterpretation of the results and hence real advan- 
tages to be gained from a proteome approach. In both in- 
stances, the ability to examine entire proteome profiles, in- 
cluding the P450 proteins, will be a significant advantage 
in understanding and predicting the metabolism and 
toxicological outcome of drugs. 

In addition to direct organ and tissue studies, the serum, 
which collects the majority of toxicity markers released 
from susceptible organs and tissues throughout the entire 
body, can be utilized. Serum is rich in nuclease activity 
and, as pharmacogenomics is not suited to deal with these 
samples, valuable markers of toxicity could go undetected. 
However, by using proteomics for these types of analyses, 
serum markers (and clusters thereof) are now accessible 
for evaluation as indicators of toxicity. 
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Pharmacoproteomics 

Proteomics can thus be used to add a new sphere of 
analysis to the study of toxicity at the protein level, and in 
the era of '-omics' there is a case to be made to adopt the 
term 'Pharmacoproteomics™'. Animals can be dosed with 
increasing levels of an experimental drug over time, and 
serum samples can be drawn for consecutive proteome 
analyses. Using this procedure, it should be possible to 
identify individual markers, or clusters thereof, that are 
dose related and correlate with the emergence and severity 
of toxicity. Markers might appear in the serum at a defined 
drug dose and time that are predictive of early toxicity 
within certain organs and if allowed to continue will have 
damaging consequences. These serum markers could sub- 
sequently be used to predict the response of each individ- 
ual and allow tailoring of therapy whereby optimal effi- 
cacy is achieved without adverse side effects being 
apparent. This application can obviously extend to track- 
ing toxicity of drugs in clinical trials where serum can be 
readily drawn and analysed. Surrogate markers for drug ef- 
ficacy could also be detected by this procedure and could 
facilitate the challenge of identifying patient classes who 
will respond favourably to a drug and at what dosage. 

Conclusions 

By contrast to the agents administered to patients in clini- 
cal wards, the process of drug discovery is not a prescrip- 
tive series of steps. The risks are high and there are long 
timelines to be endured before it is known whether a can- 
didate drug will succeed or fail. At each step of the drug 
discovery process there is often scope for flexibility in in- 
terpretation, which over many steps is cumulative. The 
pharmaceutical companies most likely to succeed in this 
environment are those that are able to make informed 
accurate decisions within an accelerated process. 

The genomics revolution has impacted very positively 
upon these issues and now has a powerful new partner in 
proteomics. The ability to undertake global analysis of pro- 
teins from a very wide diversity of biological systems and 
to interrogate these in a high- throughput, systematic man- 
ner will add a significant new dimension to drug discov- 
ery. Each step of the process from target discovery to clini- 
cal trials is accessible to proteomics, often providing 
unique sets of data. Using the combination of genomics 
and proteomics, scientists can now see every dimension of 
their biological focus, from genes, mRNA, proteins and 
their subcellular localization. This will greatly assist our 
understanding of the fundamental mechanistic basis of 
human disease and allow new improved and speedier 
drug discovery strategies to be implemented. 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown* 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to carry out a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The expression 
profiles observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized genes provided clues to their possible functions. The 
same DNA microarrays were also used to identify genes whose expression was affected 
by deletion of the transcriptional co-repressor TUP1 or overexpression of the transcrip- 
tional activator YAP1. These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression patterns. 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (1, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 
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favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermen table sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 



using a simple robotic printing device (9). 
Cells from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30°C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (10). Fluorescently 
labeled cDNA was prepared by reverse tran- 
scription in the presence of Cy3(green)- 
or Cy5(red)-tabeled deoxyuridine triphos- 
phate (dUTP) (11) and then hybridized to 
the microarrays (12}. To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5, then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression- ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression patterns between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest of these dif- 
ferences was only 2.7-fold ( 14). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4. About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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to any gene whose function is known (J5). 
The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
sion of genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coenzyme 
A(CoA) synthase (ACSI), which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate, where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCKl » encoding 
phosphoenolpyruvate carboxykinase, and 
FBPly encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
coses-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

Just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coord i- 
nately induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (13). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (13). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression patterns. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 



the last timepoint but less than threefold at 
the preceding timepoint (Fig. 5B). All of 
these genes were known to be glucose-re- 
pressed, and five of the seven were previously 
noted to share a common upstream activat- 
ing sequence (UAS), the carbon source re- 
sponse element (CSRE) (16-20). A search 
in the promoter regions of the remaining two 
genes, ACRl and IDP2, revealed that 
ACRI, a gene essential for ACS] activity, 
also possessed a consensus CSRE motif, but 
interestingly, IDP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 

Examples from additional groups of 
genes that shared expression profiles are 
illustrated in Fig. 5, C through F. The 
sequences upstream of the named genes in 
Fig. 5C all contain stress response ele- 
ments (STRE), and with the exception 
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Fig. 1. Yeast genome microarray. The actual size of the microarray is 18 mm by 18 mm. The 
microarray was printed as described (9). This image was obtained with the same fluorescent 
scanning confocal microscope used to collect all the data we report (49). A fluorescently labeled 
cDNA probe was prepared from mRNA isolated from cells harvested shortly after inoculation (culture 
density of <5 x 10 6 celis/ml and media glucose level of 19 g/liter) by reverse transcription in the 
presence of Cy3-dUTP. Similarly, a second probe was prepared from mRNA isolated from cells taken 
from the same culture 9.5 hours later (culture density of -2 x 1 0 8 cells/ml, with a glucose level of 
<0.2 g/liter) by reverse transcription in the presence of Cy5-dUTP, In this image, hybridization of the 
Cy3-dUTP-labeled cDNA (that is, mRNA expression at the initial timepoint) is represented as a green 
signal, and hybridization of Cy5-dUTP-labeled cDNA (that is, mRNA expression at 9.5 hours) is 
represented as a red signal. Thus, genes induced or repressed after the diauxic shift appear in this 
image as red and green spots, respectively. Genes expressed at roughly equal levels before and after 
the diauxic shift appear in this image as yellow spots. 
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of HSP42, have previously been shown to 
be controlled at least in part by these 
elements (2 J -24). Inspection of the se- 
quences upstream of HSP42 and the two 
uncharacterized genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile [including 
HSP30, ALD2, OM45, and 10 uncharac- 
terized ORFs (25)), nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2,3,4 has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3 t 4 (30). Indeed, a putative 
HAP2,3,4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig. 
5D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2,3,4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS ) 
that is recognized by the Rapl DNA-bind- 
ing protein (31, 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl -binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34). Indeed, we ob- 
served that the abundance of RAPI 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two, 
HAP4 and SJP4, were induced by a factor of 
more than threefold at the diauxic shift. 
S1P4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of S/P4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic shift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



sion ratios measured in these duplicate 
experiments differed by less than a factoT 
of 2. However, in a few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray indicated by the gray box 
in Fig. 1 is shown for each of 
the experiments described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were' induced rel- 
ative to the initial timepoint, 
and green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze 
the effects of the tuplb. mu- 
tation and YAP1 overexpres- 
sion, red spots represent 
genes whose expression was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
complete images of each of 
these arrays can be viewed on 
the Internet (73). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 
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by mutations in each putative regulatory 
gene. As a test of this strategy, we analyzed 
the genomewide changes in gene expression 
that result from deletion of the TUPl gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 



Migl and is mediated by recruiting the tran- 
scriptional co- repressors Tupl and Cyc8/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-type- 
specific, and DNA-damage-inducible genes 
(40). 




Glucose 




Pentose Phosphate 
Pathway, RNA, DNA, 
Proteins 



iSIlS 



Glycolysis/ 
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Fig. 3. Metabolic reprogramming inferred from global analysis of changes in gene expression. Only key 
metabolic intermediates are identified. The yeast genes encoding the enzymes that catalyze each step 
in this metabolic circuit are identified by name in the boxes. The genes encoding succinyl-CoA synthase 
and gfycogen-debranching enzyme have not been explicitly identified, but the ORFs YGR244 and 
YPR184 show significant homology to known succinyl-CoA synthase and gtycogen-debranching en- 
zymes, respectively, and are therefore included in the corresponding steps in this figure. Red boxes with 
white lettering identify genes whose expression increases in the diauxic shift. Green boxes with dark 
green lettering identify genes whose expression diminishes in the diauxic shift. The magnitude of 
induction or repression is indicated for these genes. For multimeric enzyme complexes, such as 
succinate dehydrogenase, the indicated fold-induction represents an unweighted average of all the 
genes listed in the box. Black and white boxes indicate no significant differential expression (less than 
twofold). Trie direction of the arrows connecting reversible enzymatic steps indicate the direction of the 
flow of metabolic intermediates, inferred from the gene expression pattern, after the diauxic shift. Arrows 
representing steps catalyzed by genes whose expression was strongly induced are highlighted in red. 
The broad gray arrows represent major increases in the flow of metabolites after the diauxic shift, 
inferred from the indicated changes in gene expression. 



Wild-type yeast cells and cells bearing 
a deletion of the TUP J gene (tupl A) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon source. 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively (11). The labeled probes were 
mixed and simultaneously hybridized to 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tupl A 
strain, and thus presumably repressed by 
Tupl (41 )- A representative section of the 
microarray (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tupl A mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (13)]. Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUPI , suggesting that these genes may be 
subject to TUPl -mediated repression by 
glucose. For example, SUC2, the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUPl. 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating-type-specific genes MFA1 and 
MFA2, and the DNA damage-inducible 
RNR2 and RNR4, as well as genes involved 
in ftocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUPl itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tupl A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUP J -repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUPl 
was deleted. Another group of related 
genes that appeared to be subject to TUPl 
repression encodes the serine-rich cell 
wall mannop rote ins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 of the 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the tupl A 
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strain, and 18 of these genes were induced 
by more than sevenfold when TUP I was 
deleted. In contrast, none of 83 genes that 
could be classified as putative regulators of 
the cell division cycle were induced more 
than twofold by deletion of TUPl . Thus, 
despite the diversity of the regulatory sys- 
tems that employ Tupl, most of the genes 
that it regulates under these conditions 
fall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFAJ 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tuplk 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFAl and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAP J en- 
codes a DNA-binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Overexpression of YAP J in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild- type strain bearing a control plasmid 
and a strain with a plasmid expressing YAP! 
under the control of the strong GAL1-J0 
promoter, both grown in galactose (that is, 
a condition that induces YAPI overexpres- 
sion). Complementary DNA from the con- 
trol and YAPI overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the array represent genes 
that were induced in the strain overexpress- 
ing YAPJ. 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



YAPI was overexpressed in this way, five 
bear homology to aryl-alcohol oxidoreduc- 
tases (Fig. 2 and Table 1). An additional 
four of the genes in this set also belong to 
the general class of dehydrogenases/oxi- 
doreductases. Very little is known about 
the role of aryl-alcohol oxidoreductases in 
S. cerevisiae, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47). 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 

Fig. 4. Coordinated reg- 
ulation of functionally re- 
lated genes. The curves 
represent the average in- 
duction or repression ra- 
tios for all the genes in 
each indicated group. 
The total number of 
genes in each group was 
as follows: ribosomal 
proteins, 112; translation 
elongation and initiation 

factors, 25; tRNA synthetases (excluding mitochondial synthetases), 1 7; glycogen and trehalose syn- 
thesis and degradation, 15; cytochrome c oxidase and reductase proteins, 19; and TCA- and glyoxy- 
late-cycle enzymes, 24. 

Table 1 . Genes induced by YAP1 overexpression. This list includes all the genes for which mRNA levels 
increased by more than twofold upon YAP1 overexpression in both of two duplicate experiments, and 
for which the average increase in mRNA level in the two experiments was greater than threefold (50). 
Positions of the canonical Yap1 binding sites upstream of the start codon, when present, and the 
average fold-increase in mRNA levels measured in the two experiments are indicated. 



might play an important protective role 
during oxidative stress. Transcription of a 
small number of genes was reduced in the 
strain overexpressing Yapl. Interestingly, 
many of these genes encode sugar per- 
meases or enzymes involved in inositol 
metabolism. 

We searched for Yapl-binding sites 
(TTACTAA or TGACTAA) in the se- 
quences upstream of the target genes we 
identified (48). About two-thirds of the 
genes that were induced by more than 
threefold upon Yapl overexpression had 
one or more binding sites within 600 bases 
upstream of the start codon (Table 1 ), sug- 
gesting that they are directly regulated by 
Yapl. The absence of canonical Yapl-bind- 
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ing sites upstream of the others may reflect 
an ability of Yapl to bind sites that differ 
from the canonical binding sites, perhaps in 
cooperation with other factors, or less like- 
ly, may represent an indirect effect of Yapl 
overexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 



works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate drug targets can serve as surrogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 



required for fabricating and using DNA 
microarrays (9) consists of components 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
to accomplish the. amplification of more 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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We describe here a method for drug target validation and identification of secondary drug tar- 
get effects based on genome-wide gene expression patterns. The method is demonstrated by 
several experiments, including treatment of yeast mutant strains defective in calcineurin, im- 
munophtlins or other genes with the immunosuppressants cyclosporin A or FK506. Presence or 
absence of the characteristic drug 'signature' pattern of altered gene expression in drug-treated 
cells with a mutation in the gene encoding a putative target established whether that target was 
required to generate the drug signature. Drug dependent effects were seen in 'targetless' cells, 
showing that FK506 affects additional pathways independent of calcineurin and the im- 
munophilins. The described method permits the direct confirmation of drug targets and recog- 
nition of drug-dependent changes in gene expression that are modulated through pathways 
distinct from the drug's intended target. Such a method may prove useful in improving the effi- 
ciency of drug development programs. 



Good drugs are potent and specific; that is, they must have 
strong effects on a specific biological pathway and minimal ef- 
fects on all other pathways. Confirmation that a compound in- 
hibits the intended target (drug target validation) and the 
identification of undesirable secondary effects are among the 
main challenges in developing new drugs. Comprehensive 
methods that enable researchers to determine which genes or 
activities are affected by a given drug might improve the effi- 
ciency of the drug discovery process by quickly identifying po- 
tential protein targets, or by accelerating the identification of 
compounds likely to be toxic. DNA microarray technology, 
which permits simultaneous measurement of the expression 
levels of thousands of genes, provides a comprehensive frame- 
work to determine how a compound affects cellular metabolism 
and regulation on a genomic scale 1- ". DNA microarrays that 
contain essentially every open reading frame (ORF) in the 
Saccharomyces cerevisiae genome have already been used success- 
fully to explore the changes in gene expression that accompany 
large changes in cellular metabolism or cell cycle progression 7 " 10 . 

In the modern drug discovery paradigm, which typically be- 
gins with the selection of a single molecular target, the ideal in- 
hibitory drug is one that inhibits a single gene product so 
completely and so specifically that it is as if the gene product 
were absent. Treating cells with such a drug should induce 
changes in gene expression very similar to those resulting from 
deleting the gene encoding the drug's target. Here we have com- 
pared the genome-wide effects on gene expression that result 
from deletions of various genes in the budding yeast S. cerevisiae 
to the effects on gene expression that result from treatment 



with known inhibitors of those gene products. Using the cal- 
cineurin signaling pathway as a model system, we tested an ap- 
proach that permits identification of genes that encode proteins 
specifically involved in pathways affected by a drug. The FK506 
characteristic pattern, or 'signature' , of altered gene expression 
was not observed in mutant cells lacking proteins inhibited by 
FK506 (for example, a calcineurin or FK506-binding-protein 
mutant strain), but was observed in mutants deleted for genes 
in pathways unrelated to FK506 action (for example, a cy- 
clophilin mutant strain). Conversely, the cyclosporin A (CsA) 
signature was not observed in CsA-treated calcineurin or cy- 
clophilin mutant strains, but was seen in an FK506-binding-pro- 
tein mutant strain treated with CsA. The method also 
demonstrates that FK506, a clinically used immunosuppressant, 
has off-target' effects that are independent of its binding to im- 
munophilins. Thus, the approach we describe may provide a 
way to identify the pathways altered by a drug and to detect 
drug effects mediated through unintended targets. 

Null mutants phenocopy drug-treated cells on a genomic scale 
To test whether a null mutation in a drug target serves as a 
model of an ideal inhibitory drug, we examined the effects on 
gene expression associated with pharmacological or genetic in- 
hibition of calcineurin function. Calcineurin is a highly con- 
served calcium- and calmodulin-activated serine/threonine 
protein phosphatase implicated in diverse processes dependent 
on calcium signaling 12 " 13 . In budding yeast, calcineurin is re- 
quired for intracellular ion homeostasis 14 , for adaptation to pro- 
longed mating pheromone treatment' 5 and in the regulation of 
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Fig. 1 Model of antagonism of the calcineurin signaling pathway mediated 
by FK506 and cyclosporin A (CsA). Calcineurin activity is composed of a cat- 
alytic subunit (calcineurin A, encoded in yeast by the CAM 7 and CAM? genes), 
and calcium-binding regulatory subunrts calmodulin (CMD) and calcineurin B 
(CnB). After entering cells, FK506 and CsA specifically bind and inhibit the 
peptidy I- proline isomerase activity of their respective immunophilins, FK506 
binding proteins (FKBP) and cyclophilins (CyP). The most abundant im- 
munophilins in yeast (Fpr1 and Cphl ) are thought to mediate calcineurin in- 
hibition. Drug-immunophilin complexes bind and inhibit the calcium- and 
calmodulin-stimulated phosphatase calcineurin. Among the substrates of cal- 
cineurin are transcriptional activators that act to modulate gene expression. 



the onset of mitosis 16 . In mammals, calcineurin has been impli- 
cated in T-cell activation' 2 , in apoptosis' 7 , in cardiac hypertro- 
phy 18 and in the transition from short-term to long-term 
memory 19 . In both organisms, calcineurin activity is inhibited 
by FK506 and CsA, immunosuppressant drugs whose effects on 
calcineurin are mediated through families of intracellular recep- 
tor proteins called immunophilins 1 2 20 (Fig. 1). To assess the ef- 
fects of pharmacologic inhibition of calcineurin. wild-type 5. 
cerevisiae was grown to early logarithmic phase in the presence 
or absence of FK506 or CsA. Isogenic cells, from which the 
genes encoding the catalytic subunits of calcineurin (CAM J and 
CNA2) had been deleted 21 (referred to as the cna or calcineurin 
mutant), were grown in parallel, in the absence of the drug. 
Fluorescent ly- labeled cDNA was prepared by reverse transcrip- 
tion of polyA* RNA in the presence of Cy3- or Cy5-deoxynu- 
cleotide triphosphates and then hybridized to a microarray 
containing more than 6,000 DNA probes representing 97% of 
the known or predicted ORFs in the yeast genome. 
Simultaneous hybridization of Cy5-labeled cDNA from mock- 
treated cells and Cy3-Iabeled cDNA from cells treated with 1 
fig/ml FK506 allowed the effect of drug treatment on mRNA lev- 
els of each ORF to be determined (Fig. 2a and b and data not 
shown). Similarly, effects of the calcineurin mutations on the 
mRNA levels of each gene were assessed by simultaneous hy- 
bridization of Cy5-labeled cDNA from wild-type cells and Cy3- 
labeled cDNA from the calcineurin mutant strain (Fig. 2c). For 
each comparison of this kind, reported expression ratios are the 
average of at least two hybridizations in which the Cy3 and Cy5 
fluors were reversed to remove biases that may be introduced by 
gene-specific differences in incorporation of the two fluors 
(data not shown). 

Treatment with FK506 in these growth conditions resulted in 
a signature pattern of altered gene expression in which mRNA 
levels of 36 ORFs changed by more than twofold 
(http://www.rosetta.org). A very similar pattern of altered gene 
expression was observed when the calcineurin mutant strain 
was compared to wild-type cells. Comparison of the changes in 
mRNA expression of each gene resulting from treatment of 
wild-type cells with FK506 with mRNA expression changes re- 
sulting from deletion of the calcineurin genes showed the con- 
siderable similarity of the global transcript alterations in 
response to the two perturbations (Fig. 2b-d). Quantification of 
this similarity using the correlation coefficient (p) showed 
large correlations between the FK506 treatment signature and 
the calcineurin deletion signature (p = 0.75 ± 0.03), as well as 
the CsA treatment signature (p = 0.94±0.02), but not with a 
randomly selected deletion mutant strain (deleted for the 
YER071C gene; p = -0.07 ± 0.04; Fig. 2e). The FK506 treatment 
signature was also compared with those of more than 40 other 
deletion mutant strains or drug-treatments thought to affect 
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unrelated pathways, and none had statistically significant cor- 
relations. These data establish that genetic disruption of cal- 
cineurin function provides a close and specific phenocopy of 
treatment with FK506 or CsA. 

To avoid generalizing from a single example, we also com- 
pared the effects of treatment of wild-type cells with 3-aminotri- 
azole (3-AT) with the effects of deletion of the HJS3 gene. HJS3 
encodes imidazoleglycerol phosphate dehydratase, which cat- 
alyzes the seventh step of the histidine biosynthetic pathway in 
yeast 22 ; 3-AT is a competitive inhibitor of this enzyme that trig- 
gers a large transcriptional amino-acid starvation response 23 . 
Microarray analysis of wild-type and isogenic 7)fs3-deficient 
strains demonstrated the expected large genome- wide transcrip- 
tional responses (involving more than 1,000 ORFs) resulting 
from treatment with 3-AT (Fig. 3a) or from HIS3 deletion (Fig. 
3c). Quantitative comparison of the 3-AT treatment signature 
and the hts3 mutant signature showed a high level of correlation 
(p= 0.76 ± 0.02) that even extended to genes that experienced 
small changes in expression level (Fig. 36). As a negative control, 
the correlations between the 3-AT treatment signature or the 
his3 mutant signature and the calcineurin mutant strain were 
not statistically significant (p = 0.09 ± 0.06 and -0.01 ± 0.04, re- 
spectively). That both the calcineurin/FK506 and the his3/3-AT 
comparisons were highly correlated indicates that in many cases 
the expression profile resulting from a gene deletion closely re- 
sembles the expression profile of wild-type cells treated with an 
Inhibitor of that gene's product. 

'Decoder' strategy: Drug target validation with deletion mutants 

Because pharmacological inhibition of different targets might 
give similar or identical expression profiles, simple comparison 
of drug signatures to mutant signatures is unlikely to unambigu- 
ously identify a drug's target. To overcome this limitation, an 
additional 'decoder' step Is used. We first compare the expres- 
sion profile of wild-type drug-treated cells to the expression pro- 
files from a panel of genetic mutant strains, using a correlation 
coefficient metric. Mutant strains whose expression profile is 
similar to that of drug-treated wild-type cells are selected and 
subjected to drug treatment, generating the drug signature in 
the mutant strain (that is, the mutant drug signature). If the 
mutated gene encodes a protein involved in a pathway affected 
by the drug, we expect the drug signature in mutant cells to be 
different (or absent, for an ideal drug) from the drug signature 
seen in wild-type cells. 
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Fig. 2 Expression profiles from 
FK506- treated wild-type (wt) 
cells and a calcineurin-disruption 
mutant strain share a genome- 
wide correlation. DNA microarray 
analysis showing changes in gene 
expression resulting from FK506 
treatment (a and b) or from ge- 
netic disruption of genes encod- 
ing calcineurin (c). a, Pseudo- 
color image of the results of si- 
multaneous hybridization of Cy5- 
labeled cDNA (red) from 
mock-treated strain R563 and Cy3-labeled cDNA 
(green) from strain R563 treated with 1 ug/ml FK506. 
b, Enlarged view of the boxed area in a. Arrowheads in- 
dicate specific ORFs induced or repressed, c, Pseudo- 
color image of the results of simultaneous hybridization 
of Cy5- labeled cDNA (red) from strain R563 and Cy3- 
labeled cDNA (green) from strain MCY300 (deleted for 
the CNA1.CNA2 catalytic subunits of calcineurin). 
Arrows indicate specific ORFs induced or repressed, d, 
The log, a of the expression ratio for each ORF derived 
from the FK506 treatment hybridizations is plotted ver- 
sus the Iog, 0 of the expression ratio in the calcineurin 
mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments are shown as green and 
red dots, respectively, a, The log t0 of the expression ratio for each ORF de 
rived from the FK506 treatment hybridizations is plotted versus the log,, 
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of the expression ratio in the yer071c mutant hybridizations. No ORFs 
were induced or repressed in both experiments. 



To illustrate this, we treated the his3 mutant strain with 3- 
AT. The signature pattern of altered gene expression resulting 
from treatment of the mutant strain with 3-AT was much less 
complex than that of the 3-AT signature in wild-type cells (Fig. 
4). This is seen simply by examining plots of mean intensity of 
the hybridization signal (which approximately reflects level of 
expression) versus the expression ratio for each ORF (Fig. 4). 
Genes that were expressed at higher or lower levels in 3-AT 
treated cells or in his3 mutant cells are shown as red and green 
dots, respectively. We analyzed the 3-AT signature in wild-type 
(Fig. 4a) and his3 mutant cells (Fig. 4c), as well as the his3 mu- 
tant strain signature (Fig. 4 b). Whereas histidine limitation in- 
duced by 3-AT induced more than 1,000 transcription-level 
changes in the wild-type strain, few or no transcript level 
changes were induced by treatment of the Ms3-deletion strain 
with 3-AT. This indicates that with the growth conditions used, 
essentially all of the effects of 3-AT depend on or are mediated 
through the HIS3 gene product. 

Applying this approach to the calcineurin signaling pathway 
showed the specificity of the method. The calcineurin mutant 
strain and strains with deletions in the genes encoding the 
most abundant immunophilins in yeast 12 (CPH1 and FPRJ) 
were treated with either FK506 or CsA to determine the profiles 



Table 1 Signature correlation of expression ratios as a result of FK506 
treatment in various mutant strains 
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0.93 ± 0.04 


-0.01 ± 0.07 


-0.23 ± 0.07 


0.12 ±0.07 


0.79 ± 0.03 



Signature correlation shows the absence of the FK506 signature specifically in the calcineurin (cna) and fprl 
(major FK506 binding protein) deletion mutants, cna represents the mutant with deletions of the catalytic sub- 
units of calcineurin. CAM 7 and CNA2. The correlation coefficient reported in the first column represents the cor- 
relation between two pairs of hybridizations from independent wild -type +/- FK506 experiments. 



of altered gene expression resulting from drug treatment of the 
mutant cells (that is, mutant +/- drug). We compared the drug 
signatures in the mutants to the wild-type drug signature using 
the correlation coefficient metric (Table 1). Although the signa- 
ture generated by treatment of wild-type cells with FK506 was 
highly correlated to the calcineurin mutant strain signature (p 
= 0.75 ± 0.03), it bore no similarity to the profile after treat- 
ment of the calcineurin mutant strain with FK506 (p - -0.01 ± 
0.07). This indicates that FK506 was unable to elicit its normal 
transcriptional response in the calcineurin mutant strain. 
Likewise, treatment of the fprl mutant strain with FK506 
elicited an expression profile that was not correlated to the 
FK506 signature in the wild-type strain (p = -0.23 ± 0.07), indi- 
cating that the FPR1 gene product is likely to be involved in the 
pathway affected by FK506. The same was true for the cna fprl 
mutant strain. In contrast, treatment of the cphl mutant strain 
with FK506 generated an expression profile highly correlated 
with the wild-type FK506 expression profile (p = 0.79 ± 0.03), 
indicating the cphl mutation did not block the mode of action 
of FK506 and thus is not directly involved in the pathway af- 
fected by FK506. We tabulated the change in expression in re- 
sponse to FK506 in different mutant strains for all ORFs with 
expression ratios greater than 1.8 in FK506-treated cells or in 
the calcineurin mutant strain (Fig. 5a) .The 
calcineurin mutant strain signature and the 
FK506 responses in wild-type and the cphl 
mutant strain are similar, and there are no 
transcript-level changes (seen in black) for 
treatment of the calcineurin, fprl and cna 
fprl mutant strains with FK506 (Fig. 5a). 

Similar experiments and analyses with CsA 
provided further validation of this approach. 
The expression profile elicited by treatment 
of wild -type cells with CsA was highly corre- 
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Fig. 3 Expression profiles 
from a his3 mutant strain 
and wild-type (wt) cells 
treated with 3-AT share a 
genome-wide correlation. 
DNA microarray analysis 
showing changes in gene 
expression resulting from 3- 
AT treatment (a) or from ge- 
netic disruption of the H/53 
gene (c). a, Pseudo-color 
image of the results of simul- 
taneous hybridization of 

Cy5- labeled cDNA (red) from mock-treated wild-type strain R491 and 
Cy 3- labeled cDNA (green) from strain R491 treated with 10 mM 3-AT. 
b. Plot of the log 10 of the expression ratio for each ORF derived from the 
3-AT treatment hybridizations is plotted versus the log 10 of the expression 
ratio in the his3 mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments are shown as green and red dots, respec- 
tively. The correlation of expression ratios applies not only to genes with 
large expression ratios (for example. CHA1 and ARG1), but also extends to 
genes with expression ratios less than 2 (for example, ILV1 and CPH1). 
ILV1 is induced 1 .9-fold and 1 .5-fold, and CP HI is downregulated 1 .9-fold 
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and 1 .7-fold, in cells treated with 3-AT and his3 mutant cells, respectively. 
Two ORFs do not fall on the line x = y. The leftmost point is the HIS3 data 
point, which is induced by 3-AT treatment but which is not absent from 
the hh3 mutant strain. The other point is YOR203w. Both data points are 
labeled HIS3 because hybridization to YOR203w is most likely due to H!S3 
mRNA. as YOR203w overlaps the HIS3 open reading frame, e, Pseudo- 
color image of the results of simultaneous hybridization of Cy5-tabeled 
cDNA (red) from wild-type strain R491 and Cy3-labeled cDNA (green) 
from strain Rl 226, deleted for the HIS3 gene. Arrowheads indicate spe- 
cific ORFs induced or repressed. 



lated to the profile elicited by mutation of the calcineurin genes 
(p = 0.71 ± 0.04), but did not correlate with the expression pro- 
file resulting from treatment of the calcineurin mutant strain 
with CsA (p = -0.05 ± 0.07; Table 2), indicating that the genetic 
deletion of calcineurin interfered with the ability of CsA to 
elicit its normal transcriptional response. Likewise, the CsA sig- 
nature was essentially absent in CsA-treated cphl mutant cells, 
and the expression profile of CsA-treated cphl mutant cells cor- 
related poorly to that of CsA-treated wild-type cells (p = 0.18 ± 
0.07). Thus, the CPH1 gene product was required for the CsA re- 
sponse seen in wild-type cells. Conversely, treatment of fprl 
mutant cells with CsA resulted in an expression pattern very 
similar to the profile of CsA-treated wild-type cells (p = 0.77 ± 
0.03), indicating that FPR1 was not necessary for the CsA-medi- 
ated effects. Analysis of individual ORFs affected by CsA and 
their expression ratios over the entire set of experiments con- 
firmed that CPH1 and the genes encoding calcineurin, but not 



wt-/+ 10mM 3-AT 




Log,,, (intensity) 



Fig. 4 Treatment of the his3 mutant strain with 3-AT shows nearly com- 
plete loss of 3-AT signature. A plot of the log,„ of the mean intensity of hy- 
bridization for each ORF versus the log 10 of its expression ratio for each 
experiment is shown next to a pseudo-color image of a representative 
portion of the microarray. ORFs that are induced or repressed at the 95% 
confidence level are shown in green and red, respectively, a, Expression 
profile from treatment of the wild-type (wt) strain with 3-AT. Cy5-labeled 
cDNA (red) from mock-treated strain R491 and Cy3-labeled cDNA 
(green) from strain R491 treated with 10 mM 3-AT. b, Expression profile 



FPRL are necessary for the wild-type CsA response (Fig. Sb). The 
observation that the profiles resulting from FK506 or CsA drug 
treatment are similar to that of the calcineurin deletion mutant 
strain might allow the prediction that calcineurin was involved 
in the pathway affected by these drugs. But because the expres- 
sion profile of the fprl mutant strain did not bear a strong simi- 
larity to the wild-type drug expression profile for FK506, it is 
obvious that the drug treatment of the mutant strains was nec- 
essary to identify Fprl, but not Cphl, as a potential FK506 drug 
target. In the same way, the 'decoder* strategy was necessary to 
identify Cphl, but not Fprl, as a potential drug target for CsA. 

'Decoder' approach can identify secondary drug effects 

For a drug that has a single biochemical target, the strategy out- 
lined above may be useful in target validation. In many cases, 
however, a compound may affect multiple pathways and elicit 
a very complex signature. 'Decoding' such a complex signature 



hi$3 mutant -/+ 10 mM 3-AT 




Log 10 (intensity) 

from the his3 deletion strain. Cy5-labeled cDNA (red) from strain R491 
and Cy3-labeled cDNA (green) from strain R1226. deleted for the HtS3 
gene. c. Expression profile of treatment of the his3 deletion strain with 3- 
AT. Cy3-labeled cDNA (red) from /ws3-de!eted strain R1226 and Cy5-la- 
beted cDNA (green) from strain R1226 treated with 10 mM 3-AT. 
Arrowheads indicate the DNA probe and data point corresponding to the 
HIS3 gene. The blue dashed line represents the threshold below which er- 
rors tend to increase rapidly because spot intensities are not sufficiently 
above background intensity. 
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Table 2 Signature correlation of expression ratios as a result of CsA 
treatment in various mutant strains 





wild-type 


cna 


fprl 


cna cphl 


cphl 




+/-CsA 


+/-CsA 


+/-CsA 


+/-CsA 


+/-CsA 


wild-type 












+/- CsA 


0.94 * 0.04 


-0.05 ± .07 


0.77 ± 0.03 


-0.11 ±0.07 


0.18 ±0.07 



Signature correlation shows the absence of the CsA signature specifically in the calcineurin (cna) and cphl 
(cyctophtlin) deletion mutants, cna represents the mutant with deletions of the catalytic su bun its of cal- 
cineurin, CNA 1 and CNA2, The correlation coefficient reported in the first column represents the correlation 
between two pairs of hybridizations from independent wild-type +/- CsA experiments. 



into the effects mediated through the intended target {the *on- 
target signature') and those mediated through unintended tar- 
gets (the 'off-target* signature) might be useful in evaluating a 
compound's specificity. Our 'decoder* strategy is based on the 
premise that 'off-target' signature should be insensitive to the 
genetic disruption of the primary target. 

To determine whether the 'decoder approach could identify 
an 'off-target' profile, we looked for a drug-responsive gene 
whose expression is insensitive to deletion of the primary tar- 
get. To increase the likelihood of observing such genes, the 
same strains described in Tables 1 and 2 were treated with 
higher concentrations (50 ug/ml) of FK506. This led to a much 
more complex expression profile In wild-type cells, indicating 
that at this higher concentration, FK506 was inhibiting or acti- 
vating additional targets. Several of the ORFs in this expanded 
FK506-induced expression profile were not afTected by the cal- 
cineurin, cphl or fprl mutations, as drug treatment of these mu- 
tant strains did not block their presence in the FK506 
expression signature (Fig. 6). This Indicates that FK506 was trig- 
gering changes in transcript levels of many genes through path- 
ways independent of calcineurin, CPH1 and FPRL Many of the 
upregulated ORFs in the 'off- target' pathway were genes re- 
ported to be regulated by the transcriptional activator Gcn4 
(ref. 24). In some strains, a reporter gene under GCN4 control 
was induced in response to FK506 treatment 25 . To determine 
whether GCN4 is involved in this pathway that is independent 
of calcineurin, CPH1 and FPRl, we analyzed the effects of treat- 
ment with high-dose FK506 on global gene expression in a 
strain with a GCN4 deletion (Fig. 6). Of the 41 ORFs with cal- 
cineurin-independent expression ratios greater than 4, 32 were 
not induced in the gcn4 mutant, indicating that their induction 
by FK506 was GC/V4-dependent. Not all CCAT4-regulated genes 
were induced by FK506. This FK506-induced subset of GCN4- 
regulated genes may be those most sensitive to subtle changes 
in Gcn4 levels, or perhaps other regulatory circuits prevent 
FK506 activation of some CCN4-regulated genes. Seven of the 
remaining nine ORFs induced by FK506 were independent of 



both the calcineurin and GCN4 pathways. The 
simplest explanation is that FK506 inhibits or 
activates additional pathways. Members of this 
class include SNQ2 and PDR5, genes that en- 
code drug efflux pumps with structural homol- 
ogy to mammalian multiple drug resistance 
proteins 26 . FK506 may interact directly with 
Pdr5 to inhibit its function". Our results indi- 
cate that treatment with FK506 leads to four- 
fold-to-sixfold induction of PDRSmRNA levels. 
YORL another gene that can confer drug resis- 
tance, is also induced threefold-to-fourfold by 
FK506. Thus, drug treatment of strains with mutations In the 
primary targets can prove useful in identifying effects mediated 
by secondary drug targets, including the nature and extent of 
newly discovered and previously unsuspected pathways af- 
fected by the drug. 

We describe here a method for drug target validation and the 
identification of secondary drug target effects that uses DNA mi- 
croarrays to survey the effects of drugs on global gene expres- 
sion patterns. We established that genetic and pharmacologic 
inhibition of gene function can result in extremely similar 
changes in gene expression. We also demonstrated that one can 
confirm a potential drug target by treating a deletion mutant 
defective in the gene encoding the putative target. Drug-medi- 
ated signatures from strains with mutations in pathways or 
processes directly or indirectly affected by the drug bore little or 



Strain: 



FK506: 



cphl 



fprt 



cna cna fprl 




Strain: 



cphl 



fprl 



cna cna fprl 



Fig. 5 Response of FK506 and CsA signature genes in strains with deletions 
in different genes. Genes with expression ratios greater than a factor of 1 .8 in 
response to treatment with 1 ug/ml FK506 (a) or 50 ug/ml CsA (b) are listed 
(left side) and their expression ratios in the indicated strain are shown on the 
green (induction)-red (repression) color scale, a, Calcineurin (cna) mutant 
and FK506 treatment signature genes are in the first two columns. Almost all 
FK506 signature genes have expression ratios near unity in deletion strains 
involved in pathways affected by FK506 (calcineurin, fprl and cna fprl mu- 
tants) but not in deletion strains in unrelated pathways (cphl). b, Calcineurin 
(cna) mutant and CsA treatment signature genes are in the first two 
columns. Almost all CsA signature genes have expression ratios near unity in 
deletion strains involved in pathways affected by CsA (calcineurin, cphl and 
cna cphl mutants) but not in deletion strains in unrelated pathways (fprl). 
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no similarity to the wild-type drug expression profile. In con- 
trast, drug-mediated signatures from strains with mutations in 
genes involved in pathways unrelated to the drug's action 
showed extensive similarity to the wild-type drug signature. By 
applying this approach to a drug that affects multiple pathways 
(FK506), we were able to decode a complex signature into com- 
ponent parts, including the identification of an 'off-target' sig- 
nature that was mediated through pathways independent of 
calcineurin or the Fprl imrnunophllin. 

Discussion 

It is well-established that high-throughput biochemical screen- 
ing can identify potent inhibitory compounds against a given 
target. The 'decoder' approach described here complements 
this process by evaluating the equally important property of 
specificity: the tendency of a compound to inhibit pathways 
other than that of its intended target. The ability to observe 
such 'off-target* effects will likely be useful in several ways. 
Profiling compounds with known toxicities will allow the de- 
velopment of a database of expression changes associated with 
particular toxicities. Recognition of potential toxicities in the 
'off-target* signatures of otherwise promising compounds then 
may allow earlier identification of those likely to fail in clinical 
trials. Comparing the extent and peculiarities of 'off-target' sig- 
natures of promising drug candiates could provide a new way 
to group compounds by their effects on secondary pathways, 
even before those effects are understood. This may prove to be 
an alternative, potentially more effective, way to select com- 
pounds for animal and clinical trials. Some drugs are more ef- 
fective against a related protein than against the originally 
intended target. Sildenafil (Viagra ™), for example, was initially 
developed as a phosphodiesterase inhibitor to control cardiac 
contractility, but was found to be highly specific for phospho- 
diesterase 5, an isozyme whose inhibition overcomes defects in 



Fig. 6 Response of FK506 signature genes in strains with deletions 
in different genes. Genes with expression ratios greater than a factor 
of 4 in at least one experiment are listed and their expression ratios in 
the indicated strain are shown in the green (induction)-red (repres- 
sion) color scale. The genes have been divided into classes corre- 
sponding to these expected behaviors: 'C/M-dependent' genes 
respond to FK506 (50 ng/ml) except when either calcineurin genes or 
FPR1 or both are deleted; *GC/V4-dependent' genes respond to FK506 
except when GCN4 is deleted. These genes still respond to FK506 
when calcineurin genes or FPR1 or CPH1 are deleted; that is, their re- 
sponses are not mediated by calcineurin, Cphl, or Fprl. 'CAM- and 
GCA/4-independent' genes respond to FK506 in all deletion strains 
tested. A 'complex behavior' class is provided for those genes that did 
not match the model of FK506 response mediated through cal- 
cineurin or Fprl or separately through Gen 4. 



penile erection. It is possible that application of the 'de- 
coder' to other compounds may show that they too have a 
potent activity against a target distinct from their in- 
tended target. 

The ability to decode drug effects is dependent on the 
availability of functionally 'targetless' cells. In yeast, this 
is being achieved by systematically disrupting each yeast 
gene {Saccharomyces Deletion Consortium; http://se- 
| quence-www.stanford.edu/group/yeast_deletion_pro- 

ject/deletion.html). Efforts are underway to obtain 
■■ expression profiles from each deletion mutant strain. 
Determining signatures resulting from inactivation of es- 
sential genes presents a unique problem, but it may be 
possible to do so by examining heterozygotes or by using a con- 
trollable promoter to reduce expression of the essential gene. 
Although it is already feasible to test several compounds in 
dozens of yeast strains, another challenge for the 'decoder* 
strategy will be the efficient selection of the mutants with dele- 
tions in genes most likely to encode the intended drug target. 
The signature correlation plots described are one metric that 
could be used as part of that selection process, but others need 
to be explored. Applying the 'decoder' to mammalian cells pre- 
sents additional challenges. It is considerably more difficult to 
isolate functionally targetless' cells. Strategies involving titrat- 
able promoters, known specific inhibitors, anti-sense RNAs, ri- 
bozymes, and methods of targeting specific proteins for 
degradation are possible and should be tested. Another limita- 
tion is that not all cell types express the same set of genes and 
therefore 'off-target* effects may be different in different cell 
types. In addition, applying the 'decoder' to human cells will 
also require technical improvements that allow expression pro- 
filing from a small number of cells. Even the broader question 
of whether the insensitivity of 'off-target* signatures to the dis- 
ruption of the main target is the exception or the rule can only 
be answered by the accumulation of more data. Barkai and 
Leibler, however, have argued in favor of robustness of biologi- 
cal networks, indicating that drug perturbations ('off-target* 
signatures) may be robust even when the system is subjected to 
another perturbation (such as a genetic disrupt ion) (ref. 28). 
Many practical developments will be necessary if the 'decoder' 
concept is to be broadly applied. 

Expression arrays have been used mainly as an initial screen 
for genes induced in a particular tissue or process of interest by 
focusing on genes with large expression ratios. We have 
found, however, that effort to refine experimental protocols 
and repeat experiments increases the reliability of the data and 
permits new applications. For example, it provides a larger set 
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Table 3 Yeast strains used 



Strain 


Relevant genotype 


Reference 


\/ni J Ann 

YrH499 


M3Z3 UraJ'0£ \y$c~o\Jl - IU 1 upl-AoJ ntSJ'AZUV 18UZ~A1 


(34) 


RS63 


Mata ura3-52 Iys2-801 ade2-101 trp1-A63 his3-A200 Ieu2-A 1 his3::HIS3 


(this study) 


R558 


Mata ura3-52 ly$2-801 ade2-101 trp1-A63 his3-A200 Ieu2-A1 fpr1::HlS3 


(this study) 


R567 


Mata ura3-52 ly$2-801 ade2-W1 trpl-A63 hb3-A200 Ieu2-Al cph1::HIS3 


(this study) 


ML. YoUU 


Mdza urao'oc lysc-ou i due*:- 1 u i irp t ~aoj nisj-AcUV \buc-a / cna 1A1 :. n/sc» cna 2 A 7 ::Hfi>3 


(21) 


R132 


Mata ura3-52 Iys2~801 ade2-101 trp1-A63 his3-A200 Ieu2-Al cnalA1::hisG cna2Al::HIS3 cph1::karf 


(this study) 


R133 


Mata ura3-S2 Iys2-801 ade2-101 trp1-A63 hi$3-A200 teu2-A 7 cna 1A 1::hisG cna2A1::HI$3 fprV.Mati 


(this study) 


R559 


Mata ura3-52 Iys2-801 ade2-101 trp1-A63 his3-A200 Ieu2-Al his3::HIS3 gcn4;:LEU2 


(this study) 


BY4719 


Mata trp1-A63 ura3-A0 


(35) 


BY4738 


Mata trpl-A 63 ura3-A0 


(35) 


R491 


Mata/cc BY4719 XBY4738 


(this study) 


BY4728 


Mata his3-A200 trpl-A63 ura3~A0 


(35) 


BY4729 


Mata his3-A 200 trpl-A63 ura3-A0 


(35) 


R1226 


/Wara/aBY4728XBY4729 


(this study) 



of genes at higher confidence levels that serve as a more 
unique signature for a given protein perturbation. In addition, 
it allows subtle signatures to be detected, when, for example, a 
protein is only partially inhibited. This may enable clinical 
monitoring of small changes in protein function in disease or 
toxicity states before they could otherwise be detected. 
Because the functions of many genes detected on transcript ar- 
rays are known, these microarrays are powerful tools that pro- 
vide detailed information about a cell's physiology. For 
example, changes in the flux through a metabolic pathway are 
reflected in transcriptional changes in genes in the pathway 7 . 
Furthermore, it may be possible to indirectly measure protein 
activity levels from expression profiling data (S.F., et a}., un- 
published data). Thus, although the eventual development of 
genomic methods allowing the direct measurement of all cel- 
lular protein levels will be an important achievement, tran- 
script array technology offers an immediate and robust means 
of evaluating the effects of various treatments on gene expres- 
sion and protein function. 

Methods 

Construction, growth and drug treatment of yeast strains. The strains 
used in this study {Table 3) were constructed by standard techniques 29 . 
To construct strain R559, strain R563 was transformed to Leu* with plas- 
mid pM12 digested by Sa/I and MltA (provided by A. Hinnebusch and T. 
Dever). Strains R1 32 and R1 33 were constructed by transforming the bac- 
terial kanamycin resistance cassette 30 flanked by genomic DNA from the 
CPH1 and FPR1 loci, respectively, and selecting for G418-resistant 
colonies. For experiments with FK506, ceils were grown for three genera- 
tions to a density of 1 x 10 7 cells/ml in YARD medium (YPD plus 0.004% 
adenine) supplemented with 10 mM calcium chloride as described 31 . 
Where indicated, FK506 was added to a final concentration of 1 ng/mi 
0.5 h after inoculation of the culture or to 50 ng/ml 1 h before cells were 
collected. CsA was used at a final concentration of 50 ug/ml. Cells were 
broken by standard procedures 32 with the following modifications: Cell 
pellets were resuspended in breaking buffer (0.2 M Tris HCI pH 7.6, 0.5 M 
NaCI, 10 mM EDTA, 1% SDS), vortexed for 2 min on a VWR multi-tube 
vortexer at setting 8 in the presence of 60% glass beads (425-600 ^m 
mesh; Sigma) and phenolxhloroform (50:50, volume/volume). After sep- 
aration of the phases, the aqueous phase was re-extracted and ethanol- 
precipitated. Poly A* RNA was isolated by two sequential 
chromatographic purifications over oligo dT cellulose (New England 
Biolabs, Beverly, Massachusetts) using established protocols". 

For experiments using 3 -AT, wild -type or Ns3/his3 cells were grown to 
early logarithmic phase in SC medium, pelleted and resuspended in SC 
medium lacking histidine for 1 hr in the presence or absence of 10 mM 3- 



AT, as indicated. Cells were harvested and mRIMA isolated as above. 
FK506 was obtained from the Swedish Hospital Pharmacy (Seattle, 
Washington) and purified to homogeneity by ethyl acetate extraction by 
J. Simon (Fred Hutchinson Cancer Research Center. Seattle, Washington). 
CsA was obtained from Alexis Biochemicals (San Diego, California); 3-AT 
was from Sigma. 

Preparation and hybridization of the labeled sample. Fluorescently- la- 
beled cDNA was prepared, purified and hybridized essentially as de- 
scribed 7 . Cy3- or Cy5-dUTP (Amersham) was incorporated into cDNA 
during reverse transcription (Superscript II; Life Technologies) and puri- 
fied by concentrating to less than 10 \x\ using Microcon-30 microconcen- 
trators (Amicon. Houston, Texas). Paired cDNAs were resuspended in 
20-26 nl hybridization solution (3 x SSC, 0.75 jig/ml polyA DNA, 0.2% 
SDS) and applied to the microarray under a 22- x 30-mm coverslip for 6 
h at 63 "C, all according to a published method 7 . 

Fabrication and scanning of microarrays. PCR products containing 
common 5' and 3' sequences (Research Genetics, Huntsville. Alabama) 
were used as templates with amino- modified forward primer and unmod- 
ified reverse primers to PCR amplify 6,065 ORFs from the S. cerevisiae 
genome. Our first-pass success rate was 94%. Amplification reactions that 
gave products of unexpected sizes were excluded from subsequent analy- 
sis. ORFs that could not be amplified from purchased templates were am- 
plified from genomic DNA. DNA samples from 100-^1 reactions were 
isopropanol-precipitated, resuspended in water, brought to a final con- 
centration of 3x SSC in a total volume of 15 and transferred to 384- 
well microtiter plates (Genetix Limited, Christchurch, Dorset, England). 
PCR products were spotted onto 1 x 3- inch poly lysine- treated glass slides 
by a robot built essentially according to defined specifications"- 7 
(http://cmgm.stanford.edu/pbrown/MGuide). After being printed, slides 
were processed according to published protocols 7 . 

Microarrays were imaged on a prototype multi-frame CCD camera in 
development at Applied Precision (Issaquah, Washington). Each CCD 
image frame was approximately 2-mm square. Exposure times of 2 s in 
the Cy5 channel (white light through Chroma 618-648 nm excitation fil- 
ter, Chroma 657-727 nm emission filter) and 1 s in the Cy3 channel 
(Chroma 535-560 nm excitation filter. Chroma 570-620 nm emission fil- 
ter) were done consecutively in each frame before moving to the next, 
spatially contiguous frame. Color isolation between the Cy3 and Cy5 
channels was about 100:1 or better. Frames were 'knitted' together in 
software to make the complete images. The intensity of spots (about 100 
nm) were quantified from the 10-um pixels by frame- by-frame back- 
ground subtraction and intensity averaging in each channel. Dynamic 
range of the resulting spot intensities was typically a ratio of 1.000 be- 
tween the brightest spots and the background-subtracted additive error 
level. Normalization between the channels was accomplished by normal- 
izing each channel to the mean intensities of all genes. This procedure is 
nearly equivalent to normalization between channels using the intensity 



NATURE MEDICINE • VOLUME 4 • NUMBER 11 • NOVEMBER 1998 



1299 



ARTICLES 



$£1998 Nature America Inc. • http://medicine.nature.com 



ratio of genomic DNA spots 7 , but is possibly more robust, as it is based on 
the intensities of several thousand spots distributed over the array. 

Signature correlation coefficients and their confidence limits. 
Correlation coefficients between the signature ORFs of various experi- 
ments were calculated using: 

p-Zxtf./dx/Sy.T 2 
k k k 

where x h is the log 10 of the expression ratio for the k* gene in the x signa- 
ture, and y„ is the log 10 of the expression ratio for the k* gene in the y sig- 
nature. The summation is over those genes that were either up- or 
down-regulated in either experiment at the 95% confidence level. These 
genes each had a less than 5% chance of being actually unregulated (hav- 
ing expression ratios departing from unity due to measurement errors 
alone). This confidence level was assigned based on an error model which 
assigns a lognorma! probability distribution to each gene's expression 
ratio with characteristic width based on the observed scatter in its re- 
peated measurements (repeated arrays at the same nominal experimental 
conditions) and on the individual array hybridization quality. This latter 
dependence was derived from control experiments in which both Cy3 
and Cy5 samples were derived from the same RNA sample. For large 
numbers of repeated measurements the error reduces to the observed 
scatter. For a single measurement the error is based on the array quality 
and the spot intensity. 

Random measurement errors in the x and y signatures tend to bias the 
correlation towards zero. In most experiments, most genes are not signif- 
icantly affected but do show small random measurement errors. Selecting 
only the '95% confidence' genes for the correlation calculation, rather 
than the entire genome, reduces this bias and makes the actual biological 
correlations more apparent. 

Correlations between a profile and itself are unity by definition. Error 
limits on the correlation are 95% confidence limits based on the individ- 
ual measurement error bars, and assuming uncorretated errors 33 . They do 
not include the bias mentioned above; thus, a departure of p from unity 
does not necessarily mean that the underlying biological correlation is im- 
perfect. However, a correlation of 0.7 ±0.1, for example, is very signifi- 
cantly different from zero. Small (magnitude of p < 0.2) but formally 
significant correlation in the tables and text probably are due to small sys- 
tematic biases in the Cy5/Cy3 ratios that violate the assumption of inde- 
pendent measurement errors used to generate the 95% confidence 
limits. Therefore, these small correlation values should be treated as not 
significant. A likely source of uncorrected systematic bias is the partially 
corrected scanner detector nonlinearity that differently affects the Cy3 
and Cy5 detection channels. 

The 1 ng/ml FK506 treatment signature was compared with more 
than 40 unrelated deletion mutant strain or drug signatures. These con- 
trol profiles had correlation coefficients with the FK506 profile that were 
distributed around zero (mean p - -0.03) with a standard deviation of 
0.16 (data not shown), and none had correlations greater than p = 0.38. 
Similarly, the calcineurin mutant strain signature correlated well with the 
CsA treatment signature (p * 0.71 ± 0.04) but not with the signatures 
from the negative controls (mean p = -0.02 with a standard deviation of 
0.18). 

Quality controls. End-to-end checks on expression ratio measurement 
accuracy were provided by analyzing the variance in repeated hybridiza- 
tions using the same mRNA labeled with both Cy3 and Cy5, and also 
using Cy3 and Cy5 mRNA samples isolated from independent cultures of 
the same nominal strain and conditions. Biases undetected with this pro- 
cedure, such as gene-specific biases presumably due to differential incor- 
poration of Cy3- and Cy5-dUTP into cDNA, were minimized by doing 
hybridizations in fluor-reversed pairs, in which the Cy3/Cy5 labeling of 
the biological conditions was reversed in one experiment with respect to 
the other. The expression ratio for each gene is then the ratio of ratios be- 
tween the two experiments in the pair. Other biases are removed by algo- 
rithmic numerical de-trending. The magnitude of these biases in the 
absence of de-trending and fluor reversal is typically about 30% in the 
ratio, but may be as high as twofold for some ORFs. 

Expression ratios are based on mean intensities over each spot. Some 



smaller spots have fewer image pixels in the average. This does not de- 
grade accuracy noticeably until the number of pixels falls below ten, in 
which case the spot is rejected from the data set. 'Wander' of spot posi- 
tions with respect to the nominal grid is adaptively tracked in array sub- 
regions by the image processing software. Unequal spot *wander' within 
a subregion greater than half-a-spot spacing is a difficulty for the auto- 
mated quantitating algorithms; in this case, the spot is rejected from 
analysis based on human inspection of the 'wander'. Any spots partially 
overlapping are excluded from the data set. Less than 1% of spots typi- 
cally are rejected for these reasons. 
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The Transcriptional Program in 
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The temporat program of gene expression during a model physiological re- 
sponse of human cells, the response of fibroblasts to serum, was explored with 
a complementary DNA microarray representing about 8600 different human 
genes. Genes could be clustered into groups on the basis of their temporal 
patterns of expression in this program. Many features of the transcriptional 
program appeared to be related to the physiology of wound repair, suggesting 
that fibroblasts play a larger and richer role in this complex multicellular 
response than had previously been appreciated. 



The response of mammalian fibroblasts to 
serum has been used as a model for studying 
growth control and cell cycle progression (7). 
Normal human fibroblasts require growth 
factors for proliferation in culture; these 
growth factors are usually provided by fetal 
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bovine serum (FBS). In the absence of 
growth factors, fibroblasts enter a nondivid- 
ing state, termed G 0 , characterized by low 



metabolic activity. Addition of FBS or puri- 
fied growth factors induces proliferation of 
the fibroblasts; the changes in gene expres- 
sion that accompany this proliferative re- 
sponse have been the subject of many studies, 
and the responses of dozens of genes to se- 
rum have been characterized. 

We took a fresh look at the response of 
human fibroblasts to serum, using cDNA mi- 
croarrays representing about 8600 distinct hu- 
man genes to observe the temporal program of 
transcription that underlies this response. Pri- 
mary cultured fibroblasts from human neonatal 
foreskin were induced to enter a quiescent state 
by serum deprivation for 48 hours and then 
stimulated by addition of medium containing 
10% FBS (2). DNA microarray hybridization 
was used to measure the temporal changes in 
mRNA levels of 8613 human genes (3) at 12 
times, ranging from 1 5 min to 24 hours after 
serum stimulation. The cDNA made from pu- 
rified mRNA from each sample was labeled 
with the fluorescent dye Cy5 and mixed with a 
common reference probe consisting of cDNA 
made from purified mRNA from the quiescent 
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Fig. 1. The same section of 
the microarray is shown 
for three independent hy- 
bridizations comparing RNA 
isolated at the 8-hour time 
point after serum treat- 
ment to RNA from serum- 
deprived cells. Each mi- 
croarray contained 9996 
elements, including 9804 
human cDNAs, represent- 
ing 8613 different genes. 
mRNA from serum-de- 
prived cells was used to 
prepare cDNA labeled with 
Cy3-deoxyuridine triphosphate (dUTP), and mRNA harvested from cells at different times after serum 
stimulation was used to prepare cDNA labeled with Cy5-dUTP. The two cDNA probes were mixed and 
simultaneously hybridized to the microarray. The image of the subsequent scan shows genes whose 
mRNAs are more abundant in the serum-deprived fibroblasts (that is. suppressed by serum treatment) 
as green spots and genes whose mRNAs are more abundant in the serum-treated fibroblasts as red 
spots. Yellow spots represent genes whose expression does not vary substantially between the two 
samples. The arrows indicate the spots representing the following genes: 1, protein disulfide isomerase- 
related protein P5; 2. IL-8 precursor; 3. EST AA057170; and 4, vascular endothelial growth factor. 
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culture (time zero) labeled with a second fluo- 
rescent dye, Cy3 (4). The color images of the 
hybridization results (Fig. 1) were made by 
representing the Cy3 fluorescent image as 
green and the Cy5 fluorescent image as red and 
merging the two color images. 

Diverse temporal profiles of gene expres- 
sion could be seen among the 8613 genes sur- 



veyed in this experiment (Fig. 2); many of these 
genes (about half) were unnamed expressed 
sequence tags (ESTs) (5). Although diverse 
patterns of expression were observed, the order- 
ly choreography of the expression program be- 
came apparent when the results were analyzed 
by a clustering and display method developed 
in our laboratory for analyzing genome-wide 



gene expression data (6). An example of such 
an analysis, here applied to a subset of 517 
genes whose expression changed substantially 
in response to serum (7), is shown in Fig. 2. 
The entire detailed data set underlying Fig. 
2 is available as a tab-delimited table (in 
cluster order) at the Science Web site (www. 
sciencemag.org/feature/data/984559.shl). In 
addition, the entire, larger data set for the 
complete set of genes analyzed in this exper- 
iment can be found at a Web site maintained 
by our laboratory (genome-www.stanford. 
edu/serum) (8). 

One measure of the reliability of the 
changes we observed is inherent in the ex- 
pression profiles of the genes. For most genes 
whose expression levels changed, we could 
see a gradual change over a few time points, 
which thus effectively provided independent 
measurements for almost all of the observa- 
tions. An additional check was provided by 
the inclusion of duplicate and, in a few cases, 
multiple array elements representing the 
same gene for about 5% of the genes included 
in this microarray. In addition, three indepen- 
dent hybridizations to different microarrays 
with mRNA samples from cells harvested 8 
hours after serum addition showed good cor- 
relation (Fig. 1). As an independent test, we 
measured the expression levels of several 
genes using the TaqMan 5' nuclease fluori- 
genic quantitative polymerase chain reaction 
(PCR) assay (9). The expression profiles of 
the genes, as measured by these two indepen- 
dent methods, were very similar (Fig. 3) (10). 

The transcriptional response of fibroblasts 
to serum was extremely rapid. The immediate 
response to serum stimulation was dominated 
by genes that encode transcription factors 
and other proteins involved in signal trans- 
duction. The mRNAs for several genes [in- 
cluding c-FOS, JUN B, and mitogen-acti- 
vated protein (MAP) kinase phosphatase- 1 
(MKP1)] were detectably induced within 
15 min after serum stimulation (Fig. 4, A 
and B). Fifteen of the genes that were 
observed to be induced by serum encode 
known or suspected regulators of transcrip- 
tion (Fig. 4B). All but one were immediate- 
early genes — their induction was not inhib- 
ited by cycloheximide (11). This class of 
genes could be distinguished into those 
whose induction was transient (Fig. 2, clus- 
ter E) and those whose mRNA levels re- 
mained induced for much longer (Fig. 2, 
clusters I and J). Some features of the 
immediate response appeared to be directed 
at adaptation to the initiating signals. We 
observed a marked induction of mRNA 
encoding MKP1, a dual-specificity phos- 
phatase that modulates the activity of the 
ERK1 and ERK2 MAP kinases (12). The 
coincidence of the peak of expression of 
genes in cluster E (Fig. 2) with that of 
MKP1 (Fig. 4 A) suggests the possibility 



Fig. 2. Cluster image 
showing the different 
dasses of gene expres- 
sion profiles. Five hun- 
dred seventeen genes 
whose mRNA levels 
changed in response to 
serum stimulation were 
selected (7). This sub- 
set of genes was clus- 
tered hierarchically into 
groups on the basis of 
the similarity of their 
expression profiles by 
the procedure of Eisen 
et ai (6). The expres- 
sion pattern of each 
gene in this set is dis- 
played here as a hori- 
zontal strip. For each 
gene, the ratio of 
mRNA levels in fibro- 
blasts at the indicat- 
ed time after serum 
stimulation ("unsync" 
denotes exponentially 
growing cells) to its 
level in the serum-de- 
prived (time zero) fi- 
broblasts is represented 
by a color, according to 
the color scale at the 
bottom. The graphs 
show the average ex- 
pression profiles for the 
genes in the corre- 
sponding "duster" (in- 
dicated by the letters A 
to J and color coding). 
In every case examined, 
when a gene was rep- 
resented by more than 
one array element, the 
multiple representa- 
tions in this set were 
seen to have identical 
or very similar expres- 
sion profiles, and the 
profiles corresponding 
to these independent 
measurements dus- 
tered either adjacent 
or very dose to each 
other, pointing to the 
robustness of the dus- 
tering algorithm in 
grouping genes with 
very similar patterns of 
expression. 
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that continued activity of the MAP kinase path- 
way is required to maintain induction of these 
genes but not of those with sustained expression 
(clusters I and J). The gene encoding a second 
member of the dual-specificity MAP kinase 
phosphatase family, known as dual-specificity 
protein phosphatase 67pyst2, was induced later, 
at about 4 hours after serum stimulation. Genes 
encoding diverse other proteins with roles in 
signal transduction, ranging from cell-surface 
receptors [for example, the sphingosine 1- 
phosphate receptor (EDG-1), the vascular en- 
dothelial growth factor receptor, and the type II 
BMP receptor] to regulators of G-protein sig- 
naling (for example, NETl/pl 15 rho GEF) to 
DNA-binding transcription factors, were in- 
duced by serum (Fig. 4A). 

The reprogramming of the regulatory cir- 
cuits in response to serum involved not only 
induction of transcription factors but also re- 
duced expression of many transcriptional reg- 
ulators — some of which may play roles in 
maintaining the cells in G 0 or in priming 
them to react to wounding (Fig. 4C). Perhaps 
as a consequence of the historical focus on 
genes induced by serum stimulation of fibro- 
blasts, the set of transcription factors whose 
expression diminished upon serum stimula- 
tion has been less well characterized. 

Genes known or likely to be involved in 
controlling and mediating the proliferative re- 
sponse showed distinctive patterns of regula- 
tion. Several genes whose products inhibit pro- 
gression of the cell-division cycle, such as p27 
Kipl , p57 Kip2, and pi 8, were expressed in the 
quiescent fibroblasts and down-regulated be- 
fore the onset of cell division. The nadir in the 
mRN A levels for these genes occurred between 
6 and 12 hours after serum stimulation (Fig. 
5A), coincident with the passage of the fibro- 
blasts through G,. The levels of the transcript 
encoding the WEE 1 -like protein kinase, which 
is believed to inhibit mitosis by phosphoryl- 
ation of Cdc2, diminished between 4 and 8 to 
12 hours after serum addition (Fig. 5A), well 



before the onset of M phase at around 1 6 hours, 
raising the possibility of an additional role for 
Weel in an earlier stage of the cell cycle or in 
regulating the G 0 to G, transition. Several 
genes induced in the first few hours after serum 
stimulation, such as the helix-loop-helix pro- 
teins ID2 and ID3 and EST AA016305, a gene 
with homology to G,-S cyclins, are candidates 
for roles in promoting the exit from G 0 . 

Genes involved in mediating progression 
through the cell cycle were characterized by a 
distinctive pattern of expression (Fig. 2, clus- 
ter D), reflecting the coincidence of their 
expression with the reentry of the stimulated 
fibroblasts into the cell-division cycle. The 
stimulated fibroblasts replicated their DNA 
about 16 hours after serum treatment. This 
timing was reflected by the induction of 
mRNA encoding both subunits of ribonucle- 
otide reductase and PCNA, the processivity 
factor for DNA polymerase epsilon and delta. 
Cyclin A, Cyclin Bl, Cdc2, and CDC28 ki- 
nase, regulators of passage through the S 
phase and the transition from G 2 to M phase, 
were induced at about 16 to 20 hours after 
serum addition. The kinase in the Cyclin 
Bl-CDK pair needs to be activated by phos- 
phorylation. The gene encoding Cycl in-de- 
pendent kinase 7 (CDK7; a homolog of Xe- 
nopus M015 cdk-activating kinase) was in- 
duced in parallel with the Cdc2 and Cdc28 
kinases (Fig. 5A), suggesting a potential role 
for CDK7 in mediating M phase. DNA topo- 
isomerase II a, required for chromosome seg- 
regation at mitosis; Mad2, a component of 
the spindle checkpoint that prevents comple- 
tion of mitosis (anaphase) if chromosomes 
are not attached to the spindle; and the kinet- 
ochore protein CENP-F all showed a similar 
expression profile. 

In the hours after the serum stimulus, one of 
the most striking features of the unfolding tran- 
scriptional program was the appearance of nu- 
merous genes with known roles in processes 
relevant to the physiology of wound healing. 



These included both genes involved in the di- 
rect role played by fibroblasts in remodeling of 
the clot and the extracellular matrix and, more 
notably, genes encoding proteins involved in 
intercellular signaling (Fig. 5). Genes induced 
in this program encode products that can (i) 
participate in the dynamic process of clotting, 
clot dissolution, and remodeling and perhaps 
contribute to hemostasis by promoting local 
vasoconstriction (for example, endothelin-1); 
(ii) promote chemotaxis and activation of neu- 
trophils (for example, COX2) and recruitment 
and extravasation of monocytes and macro- 
phages (for example, MCP1); (iii) promote 
chemotaxis and activation of T lymphocytes 
[for example, interleukin-8 (IL-8)] and B 
lymphocytes (for example, ICAM-1), thus 
providing both innate and antigen-specific 
defenses against wound infection and recruit- 
ing the phagocytic cells that will be required 
to clear put the debris during remodeling of 
the wound; (iv) promote angiogenesis and 
neovascularization (for example, VEGF) 
through newly forming tissue; (v) promote 
migration and proliferation of fibroblasts (for 
example, CTGF) and their differentiation into 
myofibroblasts (for example, Vimentin); and 
(vi) promote migration and proliferation of 
keratinocytes, leading to reeptthelialization 
of the wound (for example, FGF7), and pro- 
mote proliferation of melanocytes, perhaps 
contributing to wound hyperpigmentation 
(for example, FGF2). 

Coordinated regulation of groups of genes 
whose products act at different steps in a 
common process was a recurring theme. For 
example, Furin, a prohormone-processing 
protease required for one of the processing 
steps in the generation of active endothelin, 
was induced in parallel with induction of the 
gene encoding the precursor of endothelin-1 
(Fig. 5E) (J 3). Conversely, expression of 
CALLAyCDIO, a membrane metalloprotease 
that degrades endothelin-1 and other peptide 
mediators of acute inflammation, was re- 
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Fig. 3. Independent verification of microarray quantitation. Relative mRNA 
tevels of the indicated genes (Mast, mast/stem cell growth factor receptor) 
were measured with the TaqMan 5' nuclease ftuorigenic quantitative PCR 
assay (9) (left) in the same samples that were used to prepare probes for 
microarray hybridizations (right). Data from the TaqMan analysis were 



Time 

normalized to mRNA concentrations and plotted relative to the level at 
time zero, so that the results could be compared with those from the 
microarray hybridizations. In general quantitation with the two methods 
gave very similar results (70). 
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duced. A second example is provided by a set 
of five genes involved in the biosynthesis of 
cholesterol (Fig. 51). The mRNAs encoding 
each of these enzymes showed sharply dimin- 
ished expression beginning 4 to 6 hours after 
serum stimulation of fibroblasts. A likely ex- 
planation for the coordinated down-regula- 
tion of the cholesterol biosynthetic pathway 
is that serum provides cholesterol to fibro- 
blasts through low-density lipoproteins, 
whereas in the absence of the cholesterol 
provided by serum, endogenous cholesterol 
biosynthesis in fibroblasts is required. 

Many of the previously studied genes that 
we observed to be regulated in this program 
have no recognized role in any aspect of wound 
healing or fibroblast proliferation. Their identi- 
fication in this study may therefore point to 
previously unknown aspects of these processes. 
A few selected genes in this group are shown in 
Fig. 5H. The stanniocalcin gene, for example 
(Fig. 5H), encodes a secreted protein without a 
clearly identified function in human cells (14, 
IS). Its induction in serum-stimulated fibro- 
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Fig. 4. "Reprograrnming" of fibroblasts. Expres- 
sion profiles of genes whose function is likely to 
play a role in the reprograrnming phase of the 
response are shown with the same representa- 
tion as in Fig. 2. In the cases in which a gene 
was represented by more than one element in 
the microarray, all measurements are shown. 
The genes were grouped into categories on the 
basis of our knowledge of their most likely role. 
Some genes with pleiotropic roles were includ- 
ed in more than one category. 



blasts suggests the possibility that it may play a 
role in the wound-healing process, perhaps 
serving as a signal in mediating inflammation 
or angiogenesis. 

One of the most important results of this 
exploration was the discovery of over 200 pre- 
viously unknown genes whose expression was 
regulated in specific temporal patterns during 
the response of fibroblasts to serum. For exam- 
ple, 1 3 of the 40 genes in cluster D (Fig. 2) have 
descriptive names that reflect their putative 
function. Nine of these 1 3 genes (69%) encode 
proteins that play roles in cell cycle progres- 
sion, particularly in DNA replication and the 
G 2 -M transition. This enrichment for cell 
cycle-related genes suggests that some of the 
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unnamed genes in this cluster — for example, 
EST W79311 and EST R13146, neither of 
which have sequence similarity to previously 
characterized genes — may represent previously 
unknown genes involved in this part of the cell 
cycle. Similarly, a remarkable fraction of genes 
that were grouped into cluster F on the basis of 
their expression profiles encoded proteins in- 
volved in intercellular signaling (Fig. 2), sug- 
gesting that a similar role should be considered 
for the many unnamed genes in this cluster. A 
disproportionately large fraction of the genes 
whose transcription diminished upon serum 
stimulation were unnamed ESTs. 

Our intention was to use this experiment as 
a model to study the control of the transition 
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Fig. 5. The transcriptional response to serum suggests a multi faceted role for fibroblasts in the 
physiology of wound healing. The features of the transcriptional program of fibroblasts in response 
to serum stimulation that appear to be related to various aspects of the wound-heating process and 
fibroblast proliferation are shown with the same convention for representing changes in transcript 
levels as was used in Figs. 2 and 4. (A) Cell cyde and proliferation, (B) coagulation and hemostasis, 
(C) inflammation, (D) angiogenesis, (E) tissue remodeling, (F) cytoskeletal reorganization, (G) 
reepitheliatization, (H) unidentified role in wound healing, and (I) cholesterol biosynthesis. The 
numbers in (C) and (G) refer to genes whose products serve as signals to neutrophils (C1), 
monocytes and macrophages (C2), T lymphocytes (C3), B lymphocytes (C4), and melanocytes (G1). 
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from G 0 to a proliferating state. However, one 
of the defining characteristics of genome-scale 
expression profiling experiments is that the ex- 
amination of so many diverse genes opens a 
window on all the processes that actually occur 
and not merely the single process one intended 
to observe. Serum, the soluble fraction of clot- 
ted blood, is normally encountered by cells in 
vivo in the context of a wound. Indeed, the 
expression program that we observed in re- 
sponse to serum suggests that fibroblasts are 
programmed to interpret the abrupt exposure to 
serum not as a general mitogenic stimulus but 
as a specific physiological signal, signifying a 
wound. The proliferative response that we orig- 
inally intended to study appeared to be part of a 
larger physiological response of fibroblasts to a 
wound. Other features of the transcriptional 
response to serum suggest that the fibroblast is 
an active participant in a conversation among 
the diverse cells that work together in wound 
repair, interpreting, amplifying, modifying, and 
broadcasting signals controlling inflammation, 
angiogenesis, and epithelial regrowth during 
the response to an injury. 

We recognize that these in vitro results 
almost certainly represent a distorted and in- 
complete rendering of the normal physiolog- 
ical response of a fibroblast to a wound. 
Moreover, only the responses elicited directly 
by exposure of fibroblasts to serum were 
examined. The subsequent signals from other 
cellular participants in the normal wound- 
healing process would certainly provoke fur- 
ther evolution of the transcriptional program 
in fibroblasts at the site of a wound, which 
this experiment cannot reveal. Nevertheless, 
we believe that the picture that emerged 
strongly suggests a much larger and richer 
role for the fibroblast in the orchestration of 
this important physiological process than had 
previously been suspected. 
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Systematic variation in gene expression 
patterns in human cancer cell lines 

Douglas T. Ross 1 , Uwe Scherf 5 , Michael B. Eisen 2 , Charles M. Perou 2 , Christian Rees 2 , Paul Spellman 2 , 
Vishwanath Iyer 1 , Stefanie S. Jeffrey 3 , Matt Van de Rijn 4 , Mark Waltham 5 , Alexander Pergamenschikov 2 , 
Jeffrey C.F. Lee 6 , Deval Lashkari 7 , Dari Shalon 6 , Timothy G. Myers 8 , John N. Weinstein 5 , David Botstein 2 
& Patrick O. Brown *> 9 

We used cDNA microarrays to explore the variation in expression of approximately 8,000 unique genes among the 
60 cell lines used in the National Cancer Institute's screen for anti-cancer drugs. Classification of the cell lines based 
solely on the observed patterns of gene expression revealed a correspondence to the ostensible origins of the 
tumours from which the cell lines were derived. The consistent relationship between the gene expression patterns 
and the tissue of origin allowed us to recognize outliers whose previous classification appeared incorrect. Specific 
features of the gene expression patterns appeared to be related to physiological properties of the cell lines, such 
as their doubling time in culture, drug metabolism or the interferon response. Comparison of gene expression pat- 
terns in the cell lines to those observed in normal breast tissue or in breast tumour specimens revealed features of 
the expression patterns in the tumours that had recognizable counterparts in specific cell lines, reflecting the 
tumour, stromal and inflammatory components of the tumour tissue. These results provided a novel molecular 
characterization of this important group of human cell lines and their relationships to tumours in vivo. 



Introduction 

Cell lines derived from human tumours have been extensively used 
as experimental models of neoplastic disease. Although such cell 
lines differ from both normal and cancerous tissue, the inaccessi- 
bility of human tumours and normal tissue makes it likely that 
such cell lines will continue to be used as experimental models for 
the foreseeable future. The National Cancer Institute's Develop- 
mental Therapeutics Program (DTP) has carried out intensive 
studies of 60 cancer cell lines (the NCI60) derived from tumours 
from a variety of tissues and organs 1 * 4 . The DTP has assessed many 
molecular features of the cells related to cancer and chemothera- 
peutic sensitivity, and has measured the sensitivities of these 60 cell 
lines to more than 70,000 different chemical compounds, includ- 
ing all common chemotherapeutics (http://dtp.nci.nih.gov). A 
previous analysis of these data revealed a connection between the 
pattern of activity of a drug and its method of action. In particular, 
there was a tendency for groups of drugs with similar patterns of 
activity to have related methods of action 33 " 7 . 

We used DNA microarrays to survey the variation in abun- 
dance of approximately 8,000 distinct human transcripts in these 
60 cell lines. Because of the logical connection between the func- 
tion of a gene and its pattern of expression, the correlation of gene 
expression patterns with the variation in the phenotype of the cell 
can begin the process by which the function of a gene can be 
inferred. Similarly, the patterns of expression of known genes can 



reveal novel phenotypic aspects of the cells and tissues studied 8-10 . 
Here we present an analysis of the observed patterns of gene 
expression and their relationship to phenotypic properties of the 
60 cell lines. The accompanying report 1 1 explores the relationship 
between the gene expression patterns and the drug sensitivity pro- 
files measured by the DTP. The assessment of gene expression pat- 
terns in a multitude of cell and tissue types, such as the diverse set 
of cell lines we studied here, under diverse conditions in vitro and 
in vivo, should lead to increasingly detailed maps of the human 
gene expression program and provide clues as to the physiological 
roles of uncharacterized genes 11-16 . The databases, plus tools for 
analysis and visualization of the data, are available (http://genome- 
www.stanford.edu/nci60 and http://discover.nci.nih.gov). 

Results 

We studied gene expression in the 60 cell lines using DNA 
microarrays prepared by robotically spotting 9,703 human 
cDNAs on glass microscope slides 17,18 . The cDNAs included 
approximately 8,000 different genes: approximately 3,700 repre- 
sented previously characterized human proteins, an additional 
1,900 had homologues in other organisms and the remaining 
2,400 were identified only by ESTs. Due to ambiguity of the iden- 
tity of the cDNA clones used in these studies, we estimated that 
approximately 80% of the genes in these experiments were cor- 
rectly identified. The identities of approximately 3,000 cDNAs 
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Fig. 1 Gene expression patterns related to the tissue of origin of the cell lines. Two-dimen- 
sional hierarchical clustering was applied to expression data from a set of 1,161 cONAs 
measured across 64 cell lines. The 1,161 cDNAs were those (of 9,703 total) with transcript 
levels that varied by at least sevenfold (log 2 (ratio) >2.8) relative to the reference pool in at 
least 4 of 60 cell lines. This effectively selected genes with the greatest variation in expres- 
sion level across the 60 cell lines (including those genes not well represented in the refer- 
ence pool), and therefore highlighted those gene expression patterns that best 
distinguished the cell lines from one another. Data from 64 hybridizations were used, one 
for each cell line plus the two additional independent representations of each of the cell 
lines K562 and MCF7. The two cell lines represented in triplicate were correspondingly 
weighted for the gene clustering so that each of the 60 cell lines contributed equally to the 
clustering, a, The cell-line dendrogram, with the terminal branches coloured to reflect the 
ostensible tissue of origin of the cell line (red, leukaemia; green, colon; pink, breast; pur- 
ple, prostate; light blue, lung; orange, ovarian; yellow, renal; grey, CNS; brown, melanoma; 
black, unknown (NCI/ADR- RES)). The scale to the right of the dendrogram depicts the cor- 
relation coefficient represented by the length of the dendrogram branches connecting 
pairs of nodes. Note that the two triplets of replicated cell lines (K562 and MCF7) cluster 
tightly together and were welt differentiated from even the most closely related cell lines, 
indicating that this clustering of cell lines is based on characteristic variations in their gene 
expression patterns rather than artefacts of the experimental procedures, b, A coloured 
representation of the data table, with the rows (genes) and columns (cell lines) in cluster 
order. The dendrogram representing hierarchical relationships between genes was omit- 
ted for clarity, but is available (http://genome-www.sta nford.edu/nci 60). The colour in each 
cell of this table reflects the mean-adjusted expression level of the gene (row) and cell line 
(column). The colour scale used to represent the expression ratios is shown. The labels 
'3a-3d' in (b) refer to the clusters of genes shown in detail in Fig. 3. 
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from these experiments have been sequence- verified, including 
all of those referred to here by name. 

Each hybridization compared Cy5- labelled cDNA reverse tran- 
scribed from mRNA isolated from one of the cell lines with Cy3- 
labelled cDNA reverse transcribed from a reference mRNA 
sample. This reference sample, used in all hybridizations, was 
prepared by combining an equal mixture of mRNA from 12 of 
the cell lines (chosen to maximize diversity in gene expression as 
determined primarily from two-dimensional gel studies 2 ). By 
comparing cDNA from each cell line with a common reference, 
variation in gene expression across the 60 cell lines could be 
inferred from the observed variation in the normalized Cy5/Cy3 
ratios across the hybridizations. 

To assess the contribution of artefactual sources of variation in 
the experimentally measured expression patterns, K562 and 
MCF7 cell lines were each grown in three independent cultures, 
and the entire process was carried out independently on mRNA 
extracted from each culture. The variance in the triplicate fluo- 
rescence ratio measurements approached a minimum when the 
fluorescence signal was greater than approximately 0.4% of the 
measurable total signal dynamic range above background in 
either channel of the hybridization. We selected the subset of 
spots for which significant signal was present in both the numer- 
ator and denominator of the ratios by this criterion to identify 
the best-measured spots. The pair-wise correlation coefficients 
for the triplicates of the set of genes that passed this quality con- 
trol level (6,992 spots included for the MCF7 samples and 6,161 
spots for K562) ranged from 0.83 to 0.92 (for graphs and details, 
see http://genome- www.stanford.edu/nci60). 

To make the orderly features in the data more apparent, we used 
a hierarchical clustering algorithm 19 * 20 and a pseudo-colour visu- 



alization matrix 3 ' 21 . The object of the clustering was to group cell 
lines with similar repertoires of expressed genes and to group 
genes whose expression level varied among the 60 cell lines in a 
similar manner. Clustering was performed twice using different 
subsets of genes to assess the robustness of the analysis. In one case 
(Fig. 1), we concentrated on those genes that showed the most 
variation in expression among the 60 cell lines (1,167 total). A sec- 
ond analysis (Fig. 2) included all spots that were thought to be well 
measured in the reference set (6,831 spots). 

Gene expression patterns related to the histologic 
origins of the cell lines 

The most notable property of the clustered data was that cell lines 
with common presumptive tissues of origin grouped together 
(Figs \a and 2). Cell lines derived from leukaemia, melanoma, 
central nervous system, colon, renal and ovarian tissue were clus- 
tered into independent terminal branches specific to their respec- 
tive organ types with few exceptions. Cell lines derived from 
non -small lung carcinoma and breast tumours were distributed 
in multiple different terminal branches suggesting that their gene 
expression patterns were more heterogeneous. 

Many of these coherent cell line clusters were distinguished by 
the specific expression of characteristic groups of genes 
(Fig. 3a-d). For example, a cluster of approximately 90 genes was 
highly expressed in the melanoma-derived lines (Fig. 3c). This set 
was enriched for genes with known roles in melanocyte biology, 
including tyrosinase and dopachrome tautomerase (TYR and 
DCT; two subunits of an enzyme complex involved in melanin 
synthesis 22 ), MARTI (MLANA; which is being investigated as a 
target for immunotherapy of melanoma 23 ) and SlOO-p (S100B; 
which has been used as an antigenic marker in the diagnosis of 
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fig. 2 Gene expression patterns related to 
other cell-line phenotypes. a. We applied 
two-dimensional hierarchical clustering to 
expression data from a set of 6,831 cDNAs 
measured across the 64 cell lines. The 6,831 
cDNAs were those with a minimum fluores- 
cence signal intensity of approximately 0.4% 
of the dynamic range above background in 
the reference channel in each of the six 
hybridizations used to establish reproducibil- 
ity. This effectively selected those spots that 
provided the most reliable ratio measure- 
ments and therefore identified a subset of 
genes useful for exploring patterns comprised 
of those whose variation in expression across 
the 60 cell lines was of moderate magnitude, 
b, Cluster-ordered data table, c Doubling 
time of cell lines. Cell lines are given in cluster 
order. Values are plotted relative to the mean. 
Doubling times greater than the mean are 
shown in green, those with doubling time less 
than the mean are shown in red. d. Three 
related gene clusters that were enriched for 
genes whose expression level variation was 
correlated with cell line proliferation rate. 
Each of the three gene clusters (clustered 
solely on the basis of their expression pat- 
terns) showed enrichment for sets of genes 
involved in distinct functional categories (for 
example, ribosomal genes versus genes 
involved in pre-RNA splicing), e. Gene cluster 
in which all characterized and sequence-veri- 
fied cDNAs encode genes known to be regu- 
lated by interferons. /, Gene cluster enriched 
for genes that have been implicated in drug 
metabolism (indicated by asterisks). A further 
property of the gene clustering evident here 
and in Fig. 2 is the strong tendency for redun- 
dant representations of the same gene to 
cluster immediately adjacent to one another, 
even within larger groups of genes with very 
similar expression patterns. In addition to 
illustrating the reproducibility and consis- 
tency of the measurements, and providing 
independent confirmation of many of our 
measurements, this property also demon- 
strates that these, and probably all, genes 
have nearly unique patterns of variation 
across the 60 cell lines. If this were not the 
case, and multiple genes had identical pat- 
terns of variation, we would not expect to be 
able to distinguish, by clustering on the basis 
of expression variation, duplicate copies of 
individual genes from the other genes with 
identical expression patterns. 
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melanoma). LOXIMVI, the seventh line designated as melanoma 
in the NCI60, did not show this characteristic pattern. Although 
isolated from a patient with melanoma, LOXIMVI has previously 
been noted to lack melanin and other markers useful for identifi- 
cation of melanoma cells 1 . 

Paradoxically, two related cell lines (MDA-MB435 and MDA- 
N), which were derived from a single patient with breast cancer 
and have been conventionally regarded as breast cancer cell lines, 
shared expression of the genes associated with melanoma. MDA- 
MB435 was isolated from a pleural effusion in a patient with 
metastatic ductal adenocarcinoma of the breast 24 ' 25 . It remains 
possible that the origin of the cell line was a breast cancer, and that 
its gene expression pattern is related to the neuroendocrine fea- 
tures of some breast cancers 26 . But our results suggest that this cell 
line may have originated from a melanoma, raising the possibility 
that the patient had a co-existing occult melanoma. 

The higher-level organization of the cell- line tree — in which 
groups span cell lines from different tissue types — also reflected 
shared biological properties of the tissues from which the cell 
lines were derived. The carcinoma- derived cell lines were divided 
into major branches that separated those that expressed genes 
characteristic of epithelial cells from those that expressed genes 
more typical of stromal cells. A cluster of genes is shown (Fig. 3b) 
that is most strongly expressed in cell lines derived from colon 
carcinomas, six of seven ovarian-derived cell lines and the two 
breast cancer lines positive for the oestrogen receptor. The named 
genes in this cluster have been implicated in several aspects of 
epithelial cell biology 27 . The cluster was enriched for genes whose 
products are known to localize to the basolateral membrane of 
epithelial cells, including those encoding components of 
adherens complexes (for example, desmoplakin (DSP), 
periplakin (PPL) and plakoglobin (JUP)), an epithelial- 
expressed cell-cell adhesion molecule (M4S1) and a sodium/ 
hydrogen ion exchanger 28 " 31 (SLC9A1). It also contained genes 
that encode putative transcriptional regulators of epithelial mor- 
phogenesis, a human homologue of a Drosophila melanogaster 
epithelial-expressed tumour suppressor (LLGL1) and a homeo- 
box gene thought to control calcium-mediated adherence in 
epithelial cells 32 * 33 (MSX2). 

In contrast, a separate, major branch of the cell-line dendro- 
gram (Fig. la) included all glioblastoma- derived cell lines, all 
renal-cell-carcinoma-derived cell lines and the remaining carci- 
noma-derived lines. The characteristic set of genes expressed in 
this cluster included many whose products are involved in stro- 
mal cell functions (Fig. id). Indeed, the two cell lines originally 
described as 'sarcoma-like* in appearance (Hs578T, breast carci- 
nosarcoma, and SF539, gliosarcoma) expressed most of these 
genes 34 * 35 . Although no single gene was uniformly characteristic 
of this cluster, each cell line showed a distinctive pattern of 
expression of genes encoding proteins with roles in synthesis or 
modification of the extracellular matrix (for example, caldesmon 
(CALD1), cathepsins, thrombospondin (THBS), lysyl oxidase 
(LOX) and collagen subtypes). Although the ovarian and most 
non-small-cell-lung-derived carcinomas expressed genes charac- 
teristic of both epithelial cells and stromal cells, they probably 
clustered with the CNS and renal cell carcinomas in this analysis 
because genes characteristically expressed in stromal cells were 
more abundantly represented in this gene set. 

Physiological variation reflected 
in gene expression patterns 

A cluster diagram of 6,831 genes (Fig. 2) is useful for exploring 
clusters of genes whose variation in mRNA levels was not obvi- 
ously attributable to cell or tissue type. We identified some gene 
clusters that were enriched for genes involved in specific cellular 



processes; the variation in their expression levels may reflect cor- 
responding differences in activity of these processes in the cell 
lines. For example, a cluster of 1,159 genes (Fig. 2a) included 
many whose products are necessary for progression through the 
cell cycle (such as CCNA1, MCM106 and MAD2L1), RNA pro- 
cessing and translation machinery (such as RNA helicases, 
hnRNPs and translation elongation factors) and traditional 
pathologic markers used to identify proliferating cells (MKI67). 
Within this large cluster were smaller clusters enriched for genes 
with more specialized roles. One cluster was highly enriched for 
numerous ribosomal genes, whereas another was more enriched 
for genes encoding RNA- splicing factors. The variation in 
expression of these ribosomal genes was significantly correlated 
with variation in the cell doubling time (correlation coefficient of 
0.54), supporting the notion that the genes in this cluster were 
regulated in relation to cell proliferation rate or growth rate in 
these cell lines. 

In a smaller gene cluster (Fig. 2d), all of the named genes were 
previously known to be regulated by interferons 13,36 . Additional 
groups of interferon-regulated genes showed distinct patterns of 
expression (data not shown), suggesting that theNCI60 cell lines 
exhibited variation in activity of interferon- response pathways, 
which was reflected in gene expression patterns 36 . 

Another cluster (Fig. 2c) contained several genes encoding 
proteins with possible interrelated roles in drug metabolism, 
including glutamate-cysteine ligase (GLCLC, the enzyme respon- 
sible for the rate limiting step of glutathione synthesis), thiore- 
doxin (TXN) and thioredoxin reductase (TXNRD1; enzymes 
involved in regulating redox state in cells), and MRP1 (a drug 
transporter known to efficiently transport glutathione- conju- 
gated compounds 37 ). The elevated expression of this set of genes 
in a subset of these cell lines may reflect selection for resistance to 
chemotherapeutics. 

Cell lines facilitate interpretation of gene expression 
patterns in complex clinical samples 

Like many other types of cancer, tumours of the breast typically 
have a complex histological organization, with connective tissue 
and leukocytic infiltrates interwoven with tumour cells. To 
explore the possibility that variation in gene expression in the 
tumour cell lines might provide a framework for interpreting the 
expression patterns in tumour specimens, we compared RNA 
isolated from two breast cancer biopsy samples, a sample of nor- 
mal breast tissue and the NCI60 cell lines derived from breast 
cancers (excluding MDA-MB-435 and MDA-N) and leukaemias 
(Fig. 4). This clustering highlighted features of the gene expres- 
sion pattern shared between the cancer specimens and individual 
cell lines derived from breast cancers and leukaemias. 

The genes encoding keratin 8 (KRT8) and keratin 19 (KRT19), 
as well as most of the other 'epithelial' genes defined in the com- 
plete NCI60 cell line cluster, were expressed in both of the biopsy 
samples and the two breast- derived cell lines, MCF-7 and T47D, 
expressing the oestrogen receptor, suggesting that these tran- 
scripts originated in tumour cells with features similar to those of 
luminal epithelial cells (Fig. 5a). Expression of a set of genes char- 
acteristic of stromal cells, including collagen genes (COL3A1, 
COL5A1 and COL6A1) and smooth muscle cell markers 
(TAGLN), was a feature shared by the tumour sample and the 
stromal-Iike cell lines Hs578T and BT549 (Fig. 5b). This feature 
of the expression pattern seen in the tumour samples is likely to 
be due to the stromal component of the tumour. The tumours 
also shared expression of a set of genes (Fig. 5c) with the multiple 
myeloma cell line (RPM 1-8226), notably including 
immunoglobulin genes, consistent with the presence of B cells 
in the tumour (this was confirmed by staining with anti- 
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Fig. 3 Gene clusters related to tissue characteristics in the cell tines. Enlargements of the regions of the cluster diagram in Fig. 1 showing gene clusters enriched 
for genes expressed in cell lines of ostensibly similar origins, a. Cluster of genes highly expressed in the leukaemia-derived cell lines. Two sub-clusters distinguish 
genes that were expressed in most leukaemia-derived lines from those expressed exclusively in the eryroblastoid line, K562 (note that the triplicate hybridiza- 
tions cluster together). 6. Cluster of genes highly expressed in all colon (7/7) cell lines and all breast-derived cell lines positive for the oestrogen receptor (2/2). This 
set of genes was also moderately expressed in most ovarian lines (5/6) and some non-small-cell-lung (4/6) lines, but was expressed at a lower level in all renal-can- 
cer-derived lines, c, Cluster of genes highly expressed in most melanoma-derived lines (6/7) and two related lines ostensibly derived from breast cancer (MDA- 
MB435 and MDA-IM). tf, Cluster of genes highly expressed in all glioblastoma (6/6) lines and most lines derived from renal-cell carcinoma (7/8), and more 
moderately expressed in a subset of carcinoma-derived lines. In all panels, names are shown only for all known genes whose identities were independently re- 
verified by sequencing. The number of sequence- validated ESTs within the cluster is indicated below the cluster in parentheses. The position of gene names in the 
adjacent list only approximates their position in the cluster diagram as indicated by the lines connecting the colour chart with the gene list. Complete cluster 
images with all gene names and accession numbers are available (http ^/genome- www.stanford.edu/nci 60). 
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immunoglobulin antibodies; data not shown). Therefore, dis- 
tinct sets of genes with co-varying expression among the samples 
(Fig. 4, arrow) appear to represent distinct cell types that can be 
distinguished in breast cancer tissue. A fourth cluster of genes, 
more highly expressed in all of the cell lines than in any of the 
clinical specimens, was enriched for genes present in the 'prolif- 
eration' cluster described above (Fig. 5d). The variation in 
expression of these genes likely paralleled the difference in prolif- 
eration rate between the rapidly cycling cultured cell lines and the 
much more slowly dividing cells in tissues. 

Discussion 

Newly available genomics tools allowed us to explore variation in 
gene expression on a genomic scale in 60 cell lines derived from 
diverse tumour tissues. We used a simple cluster analysis to iden- 
tify the prominent features in the gene expression patterns that 
appeared to reflect 'molecular signatures' of the tissue from 
which the cells originated. The histological characteristics of the 
cell lines that dominated the clustering were pervasive enough 
that similar relationships were revealed when alternative subsets 
of genes were selected for analysis. Additional features of the 
expression pattern may be related to variation in physiological 
attributes such as proliferation rate and activity of interferon- 
response pathways. 

The properties of the tumour- derived cell lines in this study 
have presumably all been shaped by selection for resistance to 
host defences and chemotherapeutics and for rapid proliferation 
in the tissue culture environment of synthetic growth media, fetal 
bovine serum and a polystyrene substratum. But the primary 
identifiable factor accounting for variation in gene expression 
patterns among these 60 cell lines was the identity of the tissue 
from which each cell line was ostensibly derived. For most of the 
cell lines we examined, neither physiological nor experimental 
adaptation for growth in culture was sufficient to overwrite the 
gene expression programs established during differentiation in 
vivo. Nevertheless, the prominence of mesenchymal features in 
the cell lines isolated from glioblastomas and carcinomas may 
reflect a selection for the relative ease of establishment of cell 
lines expressing stromal characteristics, perhaps combined with 
physiological adaptation to tissue culture conditions 38 "^ 0 . 



Fig. 4 Comparison of the gene expression patterns in clinical breast cancer 
specimens and cultured breast cancer and leukaemia cell lines, a. Two-dimen- 
sional hierarchical clustering applied to gene expression data for two breast 
cancer specimens, a lymph node metastasis from one patient, normat breast 
and the NCI 60 breast and leukaemia -de rived cell lines. The gene expression 
data from tissue specimens was clustered along with expression data from a 
subset of the NCI60 cell lines to explore whether features of expression pat- 
terns observed In specific lines could be identified in the tissue samples. Labels 
indicate gene clusters (shown in detail in Fig. 5) that may be related to specific 
cellular components of the tumour specimens, b. Breast cancer specimen 16 
stained with anti-keratin antibodies, showing the complex mix of cell types 
characteristically found In breast tumours. The arrows highlight the different 
cellular components of this tissue specimen that were distinguished by the 
gene expression cluster analysis (Fig. 5). 



Biological themes linking genes with related expression pat- 
terns may be inferred in many cases from the shared attributes of 
known genes within the clusters. Uncharacterized cDNAs are 
likely to encode proteins that have roles similar to those of the 
known gene products with which they appear to be co-regulated. 
Still, for several clusters of genes, we were unable to discern a com- 
mon theme linking the identified members of the cluster. Further 
exploration of their variation in expression under more diverse 
conditions and more comprehensive investigation of the physiol- 
ogy of the NCI60 cells may provide insight 10 . The relationship of 
the gene expression patterns to the drug sensitivity patterns mea- 
sured by the DTP is an example of linking variation in gene 
expression with more subtle and diverse phenotypic variation 1 *. 

The patterns of gene expression measured in the NCI60 cell 
lines provide a framework that helps to distinguish the cells that 
express specific sets of genes in the histologically complex breast 
cancer specimens 41 . Although it is now feasible to analyse gene 
expression in micro-dissected tumour specimens 42,43 , this obser- 
vation suggests that it will be possible to explore and interpret 
some of the biology of clinical tumour samples by sampling them 
intact. As is useful in conventional morphological pathology, one 
might be able to observe interactions between a tumour and its 
microenvironment in this way. These relationships will be clari- 
fied by suitable analysis of gene expression patterns from intact as 
well as dissected tumours 12,14 ' 15,41 . 

Methods 

cDNA clones. We obtained the 9,703 human cDNA clones (Research Genet- 
ics) used in these experiments as bacterial colonies in 96-well microtitre 
plates 9 . Approximately 8,000 distinct Unigene dusters (representing nomi- 
nally unique genes) were represented in this set of clones. All genes identi- 
fied here by name represent clones whose identities were confirmed by re- 
sequencing, or by the criteria that two or more independent cDNA clones 
ostensibly representing the same gene had nearly identical gene expression 
patterns. A single-pass 3' sequence re-verification was attempted for every 
clone after re-streaking for single colonies. For a subset of genes for which 
quality 3' sequence was not obtained, we attempted to confirm identities by 
5* sequencing. Of the subset of clones selected for 5* sequence verification 
on the basis of an interesting pattern of expression (888 total), 33 1 were cor- 
rectly identified, 57, incorrectly identified, and 500, indeterminate (poor 
quality sequence). We estimated that 1 5%-20% of array elements contained 
DNA representing more than one clone per well. So far, the identities of 
-3,000 clones have been verified. The full list of clones used and their nomi- 
nal identities are available (gene names preceded by the designation "SID#" 
(Stanford Identification) represent clones whose identities have not yet been 
veri fied ; h ttp:/ /genome- www.stanfo rd .edu:8OO07 nci60) . 

Production of cDNA microarrays. The arrays used in this experiment were 
produced at Synteni Inc. (now Incyte Pharmaceuticals). Each insert was 
amplified from a bacterial colony by sampling 1 ul of bacterial media and 
performing PCR amplification of the insert using consensus primers for 
the three plasmids represented in the clone set ( 5 '-TTGTA AAACG ACG 
GCCAGTG-3*. 5'-CACACAGGAAACAGCTATG-3'). Each PCR product 
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(100 ul) was purified by gel exclusion, concentrated and resuspended in 
3xSSC (10 ul). The PCR products were then printed on treated glass 
microscope slides using a robot with four printing tips. Detailed protocols 
for assembling and operating a microarray printer, and printing and exper- 
imental application of DNA microarrays are available (http://cmgm. 
stanford.edu/pbrown). 

Preparation of mRNA and reference pool. Cell lines were grown from NCI 
DTP frozen stocks in RPMI- 1640 supplemented with phenol red, glutamine 
(2 mM) and 5% fetal calf serum. To minimize the contribution of variations 
in culture conditions or cell density to differential gene expression, we grew 
each cell line to 80% confluence and isolated mRNA 24 h after transfer to 
fresh medium. The time between removal from the incubator and lysis of the 
cells in RNA stabilization buffer was minimized (<1 min). Cells weretysed in 
buffer containing guanidium isothiocyanate and total RNA was purified 
with the RNeasy purification kit (Qiagen). We purified mRNA as needed 



using a poly(A) purification kit (Oligotex, Qiagen) according to the manu- 
facturer's instructions. Denaturing agarose gel electrophoresis assessed the 
integrity and relative contamination of mRNA with ribosomal RNA. 

The breast tumours were surgically excised from patients and rapidly 
transported to the pathology laboratory, where samples for microarray 
analysis were quickly frozen in liquid nitrogen and stored at -80 °C until 
use. A frozen tumour specimen was removed from the freezer, cut into 
small pieces (-50-100 mg each), immediately placed into 10-12 ml of Tri- 
zol reagent (Gibco-BRL) and homogenized using a PowerGen 1 25 Tissue 
Homogenizer (Fisher Scientific), starting at 5,000 r.p.m. and gradually 
increasing to -20,000 r.p.m. over a period of 30-60 s. We processed the Tri- 
zol/ tumour homogenate as described in the Trizol protocol, including an 
initial step to remove fat. Once total RNA was obtained, we isolated mRNA 
with a FastTrack 2.0 kit (Invitrogen) using the manufacturer's protocol for 
isolating mRNA starting from total RNA. The normal breast samples were 
obtained from Clontech. 
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Fig. S Histologic features of breast cancer biopsies can be recognized and parsed based on gene expression patterns. Enlargements of the regions of the cluster 
diagram in Fig. 4 showing gene clusters enriched for genes expressed in different cell types in the breast cancer specimens, as distinguished by clustering with the 
cultured cell lines, a, A cluster including many genes characteristic of epithelial cells expressed in cell lines (T47D and MCF7) derived from breast cancer positive for 
the oestrogen receptor and tumours, b. Genes expressed in cell lines derived from breast cancer with stromal cell characteristics (HsS78T and BT549) and tumour 
specimens. Expression of these genes in the tumour samples may reflect the presence of myofibroblasts in the cancer specimen stroma, c, Genes expressed in leuko- 
cyte-derived cell lines, showing common leukocyte, and separate 'myeloid' and 'B-cell', gene clusters, d. Genes that were relatively highly expressed in all cell lines 
compared with the tumour specimens and normal breast. The higher expression of this set of genes involved in cell cycte transit in the cell lines is likely to reflect the 
higher proliferative rate of cells cultured in the presence of serum compared with the average proliferation rate of cells in the biopsied tissue. 
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We combined mRNA from the following cells in equal quantities to 
make the reference pool: HL-60 (acute myeloid leukaemia) and K562 
(chronic myeloid leukaemia); NCI-H226 (non -small-cell-lung); COLO 
205 (colon); SNB-19 (central nervous system); LOX-IMVI (melanoma); 
OVCAR-3 and OVCAR-4 (ovarian); CAKI-1 (renal); PC-3 (prostate); and 
MCF7 and Hs578T (breast). The criterion for selection of the cell lines in 
the reference are described in detail in the accompanying manuscript 12 . 

Doubling-time calculations. We calculated doubling times based on rou- 
tine NCI60 cell line compound screening data; and they reflect the dou- 
bling times for cells inoculated into 96- well plates at the screening inocula- 
tion densities and grown in RPMI 1640 medium supplemented with 5% 
fetal bovine serum for 48 h. We measured cell populations using sulforho- 
damine B optical density measurement assay. The doubling time constant k 
was calculated using the equation: N/No = e^ 1 , where No is optical density 
for control (untreated) cells at time zero, N is optical density for control cells 
after 48- h incubation, and t is 48 h. The same equation was then used with the 
derived k to calculate the doubling time t by setting N/No = 2. For a given cell 
line, we obtained No and N values by averaging optical densities (N>6,000) 
obtained for each cell line for a year's screening. Data and experimental details 
are available (http://dtp.nci.nih.gov). 

Preparation and hybridization of fluorescent labelled cDNA. For each 
comparative array hybridization, labelled cDNA was synthesized by reverse 
transcription from test cell mRNA in the presence of Cy5-dUTP, and from 
the reference mRNA with Cy3-dUTP, using the Superscript II reverse-tran- 
scription kit (Gibco-BRL). For each reverse transcription reaction, mRNA 
(2 ug) was mixed with an anchored oligo-dT (d-20T-d(AGC)) primer (4 
ug) in a total volume of 15 heated to 70 °C for 10 min and cooled on ice. 
To this sample, we added an unlabelled nucleotide pool (0.6 |xl; 25 mM 
each dATP, dCTP, dGTP, and 15 mM dTTP), either Cy3 or Cy5 conjugated 
dUTP (3 ul; 1 mM; Amersham), Sxfirst-strand buffer (6 250 mM Tris- 
HCL, pH 8.3, 375 mM KCl, 15 mM MgCl 2 ), 0.1 M DTT (3 ul) and 2 ul of 
Superscript II reverse transcriptase (200 u/u.1). After a 2-h incubation at 42 
°C, the RNA was degraded by adding 1 N NaOH (1.5 ui) and incubating at 
70 °C for 10 min. The mixture was neutralized by adding of 1 N HCL (1.5 
and the volume brought to 500 ul with TE ( 10 mM Tris, 1 mM EDTA). 
We added Cotl human DNA (20 ug; Gibco-BRL), and purified the probe 
by centrifugation in a Centricon-30 micro-concentrator (Amicon). The 
two separate probes were combined, brought to a volume of 500 ul, and 
concentrated again to a volume of less than 7 ul. We added 10 ug/ul 
poly(A) RNA (1 ul; Sigma) and tRNA (10 ug/ul; Gibco-BRL) were added, 
and adjusted the volume to 9.5 ul with distilled water. For final probe 
preparation, 20xSSC (2.1 ul; 1.5 M NaCl, 150 mM NaCttrate, pH 8.0) and 
10% SDS (0.35 ul) were added to a total final volume of 12 ul. The probes 
were denatured by heating for 2 min at 100 °C, incubated at 37 °C for 
20-30 min, and placed on the array under a 22 mmx22 mm glass coverslip. 
We incubated slides overnight at 65 °C for 14-18 h in a custom slide cham- 
ber with humidity maintained by a small reservoir of 3xSSC. Arrays were 
washed by submersion and agitation for 2-5 min in 2xSSC with 0.1% SDS, 
followed by IxSSC and then O.lxSSC. The arrays were "spun dry" by cen- 
trifugation for 2 min in a slide-rack in a Beckman GS-6 tabletop centrifuge 
in Microplus carriers at 650 r.p.m. for 2 min. 

Array quantitation and data processing. Following hybridization, arrays 
were scanned using a laser- scanning microscope (ref. 17; http://cmgm. 
slanford.edu/pbrown). Separate images were acquired for Cy3 and Cy5. We 
carried out data reduction with the program ScanAlyze (M.B.E., available 



at http://rana.stanford.edu/software). Each spot was defined by manual 
positioning of a grid of circles over the array image. For each fluorescent 
image, the average pixel intensity within each circle was determined, and a 
local background was computed for each spot equal to the median pixel 
intensity in a square of 40 pixels in width and height centred on the spot 
centre, excluding all pixels within any defined spots. Net signal was deter- 
mined by subtraction of this local background from the average intensity 
for each spot. Spots deemed unsuitable for accurate quantitation because 
of array artefacts were manually flagged and excluded from further analy- 
sis. Data files generated by ScanAlyze were entered into a custom database 
that maintains web-accessible files. Signal intensities between the two fluo- 
rescent images were normalized by applying a uniform scale factor to all 
intensities measured for the Cy5 channel. The normalization factor was 
chosen so that the mean Iog(Cy3/Cy5) for a subset of spots that achieved a 
minimum quality parameter (approximately 6,000 spots) was 0. This effec- 
tively defined the signal -in tensity-weighted 'average* spot on each array to 
have a Cy3/Cy5 ratio of 1 .0. 

Cluster analysis. We extracted tables (rows of genes, columns of individual 
microarray hybridizations) of normalized fluorescence ratios from the data- 
base. Various selection criteria, discussed in relation to each data set, were 
applied to select subsets of genes from the 9,703 cDNA elements on the 
arrays. Before clustering and display, the logarithm of the measured fluores- 
cence ratios for each gene were centred by subtracting the arithmetic mean of 
all ratios measured for that gene. The centring makes all subsequent analyses 
independent of the amount of each gene's mRNA in the reference pool. 

We applied a hierarchical clustering algorithm separately to the cell lines 
and genes using the Pearson correlation coefficient as the measure of simi- 
larity and average linkage clustering 3,19-21 . The results of this process are 
two dendrograms (trees), one for the cell lines and one for the genes, in 
which very similar elements are connected by short branches, and longer 
branches join elements with diminishing degrees of similarity. For visual 
display the rows and columns in the initial data table were reordered to 
conform to the structures of the dendrograms obtained from the cluster 
analysis. Each cell in the cluster-ordered data table was replaced by a graded 
colour (pure red through black to pure green), representing the mean- 
adjusted ratio value in the cell. Gene labels in cluster diagrams are dis- 
played here only for genes that were represented in the microarray by 
sequence-verified cDNAs. A complete software implementation of this 
process is available (http://rana.stanford.edu/software), as well as all clus- 
tering results (http://genome-www.stanford.edu/nci60). 
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